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A REVIEW OF THE STATISTICAL EVIDENCE ON 
THE ASSOCIATION BETWEEN SMOKING 


AND LUNG CANCER 


Sripney J. CuTLER 


National Cancer Institute, United States Public Health Service 


Leading investigators generally agree that a significant part 
of the observed increase in lung cancer mortality represents a 
real increase in the rate at which lung cancer is developing in 
the population. This increase cannot reasonably be attributed 
to genetic change in the human population and therefore 
must be due to environmental factors. Available evidence 
linking tobacco smoking to lung cancer is fairly extensive and 
impressive: (1) The increase in lung cancer mortality has been 
generally parallel to an increase in cigarette consumption; (2) 
In each of 14 case history studies there was a smaller percent- 
age of non-smokers and a higher percentage of heavy smokers 
among lung cancer patients than among comparable controls; 
(3) Preliminary results of two population studies indicate 
higher mortality from lung cancer among smokers than 
among non-smokers and a still higher mortality among heavy 
smokers; and (4) At least one team of investigators has pro- 
duced skin cancer in animals with condensates of tobacco 
smoke. 

There is disagreement whether the evidence at hand war- 
rants a conclusion that smoking and lung cancer are causally 
related. The relative importance of smoking, air pollution, 
and occupational exposure to cancerigenic materials remains 
to be established. 


URING the first half of the 20th century lung cancer emerged from 

relative obscurity to become an important cause of death in at 
least a dozen countries [12]. The magnitude of the increase has been 
most unusual for a so-called chronic disease. During the last two dec- 
ades, alone, the rate of lung cancer mortality in the United States in- 
creased by 400 per cent. In 1930 less than 3,000 deaths in the United 
States were ascribed to lung cancer; preliminary statistics for 1953 
attribute about 23,000 deaths to this disease. More men now die from 
lung cancer than from cancer of any other site. 
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Part of the reported increase is undoubtedly due to improved diag- 
nostic techniques and to greater alertness on the part of physicians, 
Some investigators believe that the observed increase in lung cancer 
mortality is fictitious and due entirely to improved case finding. This 
position is incompatible with the following facts: 

1. The relative increase in males has been much greater than in 
females. Between 1930 and 1950 the increase in lung cancer mortality 
in the United States was more than four times as great in males as in 
females. It seems unreasonable to assume that improved diagnostic 
techniques have been applied to a much greater extent to men than to 
women. 

2. The relative increase was greater in old than in young people, 
especially among men. For example, the increase in lung cancer mor- 
tality was more than five times as large for white men aged 75-84 as 
for those between 35 and 44 years of age. It doesn’t seem reasonable 
that the quality of diagnosis is materially affected by the age of the 
patient. 

3. Lung cancer mortality is continuing to increase without benefit 
of any marked improvement in diagnostic techniques in recent years. 


The consensus among leading investigators is that a significant part of 
the observed increase is absolute and represents a real increase in the 
rate at which lung cancer is developing in the population [3].' 

The lung cancer death rate in males is now about 4.1 times the rate 
in females. Prior to age 40, lung cancer is rare in persons of both sexes 
and occurs with increasing frequency during late adult life and old age. 
However, the manner in which the lung cancer mortality rate increases 
with age is markedly different for the two sexes. Among males, the 
mortality curve rises very rapidly between ages 40 and 70 and then 
declines almost as rapidly. In contrast, the curve for females resembles 
that for all forms of cancer combined, showing a slower but steady rise 
over the entire life span. 

Lung cancer mortality rates for the two sexes did not always differ 
so much in the past. In 1914,? the rates for males and females were at 





1 Comprehensive data on the incidence of cancer are scarce and available information covers 8 
relatively short period. One of the most extensive sets of data is provided by the studies of the National 
Cancer Institute. Ten metropolitan areas of the United States were surveyed in 1937-39 and resurveyed 
in 1947-48. The data pertaining to the incidence of lung cancer are in line with mortality data for the 
United States. During the period between the two surveys, the incidence of lung cancer increased by 
119 per cent among males and by 67 per cent among females (based on age-adjusted rates.) During the 
latter period, lung cancer occurred four and one half times as frequently in males as in females, compared 
to a male-female ratio of 3.4 during the earlier period. 

2 Deaths due to cancer of the lung were not identified separately in United States mortality statistics 
until 1914 and were not routinely tabulated until 1930. 
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about the same level and the curve of age-specific rates for each sex was 
characterized by a slow rise from early adult life until extreme old age. 
Since then lung cancer mortality rates have increased much more 
rapidly for males than for females and the shape of the mortality curve 
for males has changed. The rates for males no longer increase through- 
out the lifespan, as do the rates for most forms of cancer; they now drop 
off sharply from about age 70 [11]. 

The age-specific rates for each year are based on deaths among per- 
sons of different ages and hence on deaths among persons born at dif- 
ferent times. One may ask, however, what has been the mortality ex- 
perience of cohorts of people born at about the same time; such as 
people born around 1890, 1900, 1910, etc. Dorn [11] examined the trend 
of lung cancer mortality in the United States for cohorts of men and 
women born from 1850 on. He concluded that the observed trend is 
consistent with the hypothesis that some time in the past a carcinogenic 
agent (or agents) capable of producing cancer of the lung had been 
introduced into the environment and that (a) males were either exposed 
to it more intensely or for longer periods, or were more susceptible to it, 
aud (b) exposure, at least initially, was greater for young men than for 
those at more advanced ages. This hypothesis serves to explain the ob- 
served change in the shape of the curve of age-specific lung cancer 
mortality rates for males. Clemmesen, Nielsen, and Jensen [4] in ex- 
amining the trend of lung cancer mortality in Denmark concluded that 
the observed increase must have been caused by a carcinogenic influ- 
ence introduced during the early part of the 20th century.’ 

It is interesting to note that, if the evidence pointing to a real in- 
crease in lung cancer mortality is accepted, one is led to the hypothesis 
that it is probably due to environmental factors. It would be difficult to 
attribute an increase of the magnitude observed to genetic change. A 
number of environmental changes, not mutually exclusive, have been 
suggested as causal factors in the increase of lung cancer mortality. 
These are: (1) increased use of cigarettes, (2) increased atmospheric 
pollution by motor vehicle exhausts, factory wastes, etc., and (3) in- 
creased occupational exposure to known cancer producing substances. 

No attempt will be made here to evaluate the relative importance of 
these suspected environmental agents, other than to point out a few 
observations. The number of persons engaged in occupations involving 
exposure to known carcinogens is very small. Thus, while additional 
occupational carcinogens may be identified in the future, only a small 
part of the observed increase in lung cancer mortality may be at- 





8 The cohort approach was first applied to lung cancer mortality data by Korteweg [16]. 
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tributed to known industrial hazards. As indicated by Heller [14], “the 
atmospheric pollution theory relies mainly on urban-rural differentials 
in mortality, plus the demonstration in urban atmosphere of substances 
which are carcinogenic to animals. . . . Inability to classify the popula- 
tion by degree of exposure to atmospheric pollution, and to compute 
attack rates for groups of people with varying exposure has been a 
major deterrent in testing the hypothesis of atmospheric pollution as a 
cause of this disease.”* Available evidence linking tobacco smoking to 
lung cancer is fairly extensive and impressive. 

One reason for suspecting an association between smoking and lung 
cancer is the observed parallelism between the trends for cigarette con- 
sumption and lung cancer mortality. Annual per capita consumption of 
cigarettes in the United States was less than 100 at the beginning of the 
century, about 600 in 1920, and more than 3,000 in 1950. The parallel- 
ism between per capita consumption of cigarettes and the rate of lung 
cancer mortality has been noted in a number of European countries, as 
well as in the United States [8]. The existence of a concomitant in- 
crease, while necessary, is not sufficient in itself to establish causality. 
However, the temporal association between cigarette smoking and lung 
cancer mortality plus the observation by various clinicians that most 
patients with lung cancer seemed to be heavy smokers led to the hy- 
pothesis that smoking is a cause of lung cancer. 


CASE HISTORY STUDIES 


The assertion that there is an association between smoking and lung 
cancer is based in large part on a series of studies in which the smoking 
histories of patients with cancer of the lung have been compared with 
smoking histories of persons free of lung cancer. Fourteen studies re- 
ported between 1939 and 1954 are described in Table 1. Although a few 
women were included in several studies, the results pertain essentially 
to white males. The findings are summarized in Table 2. In each study 
there was a smaller percentage of non-smokers and a higher percentage 
of heavy smokers among lung cancer patients than among the con- 
trols. Insofar as is generally known, every study of this type has shown 
this relationship. 

Cornfield [6] has shown that data of this type may be used to estimate 
the relative risk of developing a disease for persons possessing a given 
characteristic compared with persons not possessing that characteristic. 
Estimates of the relative risk of developing lung cancer for all smokers 





4 The evidence linking occupational exposure and air pollution to lung cancer has been reviewed 
by Hueper [15]. 
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and for heavy smokers, as compared to non-smokers, are given in 
Table 2. All the studies point in the same direction—a greater risk for 
smokers than for non-smokers, and a still greater risk for heavy smok- 
ers. However, the estimates of relative risk which these studies yield 
vary over a very wide range—from 1.2 to 36.4 for all smokers, and from 
1.9 to 79.0 for heavy smokers. 

The variation in results is due in part to differences in defining the 
various smoking classes. For example, Doll and Hill based their classifi- 
cation on the most recent amount smoked, whereas Sadowsky, Gilliam, 
and Cornfield used the patient’s earliest smoking habits.’ In some cases, 
cigars and pipe tobacco were converted to an equivalent number of 
cigarettes; in others the percentage of heavy smokers given in Table 2 
pertains to cigarette smokers only. Several of the studies indicate that 
the association between smoking and lung cancer is different for ciga- 
rette, cigar, and pipe smokers. Sadowsky, Gilliam, and Cornfield classi- 
fied their cases as smokers of cigarettes only, cigars only, pipes only, 
and mixed forms of tobacco. The relative risk of developing lung cancer 
for each of these groups as compared to non-smokers was 4.5, 3.0, 1.2, 
and 4.5 to 1 respectively. Thus, differences in the proportion of ciga- 
rette, cigar, and pipe smokers in the various studies may have con- 
tributed to the variation in results. In addition, part of the variation in 
results may be attributed to differences in the age distributions of the 
cases included in the various studies. Smoking habits differ among per- 
sons in different age classes. For example, in Levin’s study the per- 
centage of heavy cigarette smokers among the controls decreased from 
a high of 46 among men 30-39 years of age to a low of 7 among men 
70-79 years of age. In addition to differences in age between studies, 
there were differences in the age distributions of the lung cancer and 
control cases of a particular study. Some of the investigators adjusted 
for age, while others did not. 

The case history studies summarized above were subject to several 
possible sources of error—memory error on the part of the respondent, 
bias on the part of the interviewer or the respondent due to knowledge 
of the diagnosis, and inadequate sampling techniques. 

Marketing studies have indicated that, due to memory error, the 
personal interview and written questionnaire tend to yield inaccurate 
information concerning an individual’s buying habits or consumption 
of specific products. It is, therefore, possible that these clinical studies 





5 Doll and Hill used two additional bases for grouping their cases into smoking classes: maximum 
amount of regular smoking, and estimated life-time consumption of tobacco. The results for all three 
methods of classification were very similar. 
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do not provide accurate information on the amount of tobacco 
smoked. However, to invalidate the results of these studies on the basis 
of memory error, it would be necessary to assume that memory error 
was responsible for a systematic overstatement of smoking habits by 
patients with lung cancer or a systematic understatement by the con- 
trol groups. 

TABLE 2 


SUMMARY OF FINDINGS REPORTED IN 14 RETROSPECTIVE 
STUDIES OF SMOKING HABITS 








Relative risk of 
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Note: Relative risk was computed by means of a technique developed by Cornfield [6, 7]. 

Heavy smokers are defined here as persons smoking more than one pack of cigarettes per day, or its 
equivalent. Approximated from variety of smoking classes, with lower limits ranging from 20 to 26 
cigarettes per day. 

® Also compared cases with cancers of the tongue, esophagus, stoinach, colon and prostate with the 
same controls. The stomach cancer cases (128) were found to resemble the controls very closely. The 
cases with all other forms of cancer combined (98) had fewer non-smokers and more heavy smokers than 
the controls, but the difference was less striking than for the lung cancer cases. 

b Numerical values of results approximated from graphic presentation. 

© Also compared cases with cancers of lip, tongue, mouth, larynx and pharynx, esophagus, and stom- 
ach to same series of controls. Found no significant differences except for lip and for larynx and pharynx 
combined. 

4 Also compared cases with cancers of lip, tongue, mouth, pharynx, larynx, esophagus, and skin 
to same series of controls. Found positive associations between laryngeal cancer and cigarette smoking 
and between lip cancer and pipe smoking. Found a negative association between smoking and skin 
cancer. 

© Also compared cases with cancers of lip, pharynx, esophagus, colon, and rectum to same series of 
controls. Found no significant differences except for lip. 





E1955 


AUCCO 
basis 
error 
s by 
con- 


ot 
9 
8 
6 
3 
0 
0 
4 
zi) 


SMOKING AND LUNG CANCER 275 


The studies of Wynder and Graham, Doll and Hill, and Levin indi- 
cate that prior knowledge of the diagnosis on the part of the inter- 
viewer or the patient did not bias the results. In the Wynder and 
Graham study, 286 men with chest ailments were interviewed prior to 
diagnosis. One hundred were later found to have lung cancer, and 186 
were found to have other conditions. The smoking habits of these 100 
lung cancer cases were very similar to those of the 505 lung cancer 
cases whose diagnoses were known at time of interview, and the smok- 
ing habits of the 186 cases that were found not to have lung cancer 
strongly resembled those of the 780 non-cancer cases in the control 
series. In the study of Doll and Hill, and in Levin’s, some of the 
cases believed to have lung cancer at time of interview were later found 
to have other chest ailments—about 14 per cent and 19 per cent re- 
spectively. The smoking histories of these initially incorrectly diag- 
nosed cases were found to be significantly different from those of the 
correctly diagnosed lung cancer cases and were very similar to those 
of patients with diseases other than cancer. 

It has been argued that the use of hospital populations for the inves- 
tigation of possible association between a disease and a population 
characteristic may lead to spurious correlation due to uncontrolled 
factors involved in bringing members of the population to the hos- 
pital [1]. In this instance, the question is whether an individual’s 
smoking habits may influence the likelihood of his being hospitalized 
for a particular disease. Is a smoker with lung cancer more likely to 
be hospitalized than a non-smoker with lung cancer, while a smoker 
with another form of cancer or another disease is less likely to be 
hospitalized? It seems rather far-fetched to insist that this type of 
selection operates. However, in order to avoid any possible bias which 
might be involved in the use of hospitalized cases, Wynder and Corn- 
field based their study on death notices in the Journal of the American 
Medical Association. Questionnaires were sent to the families of physi- 
cians who had died of various forms of cancer. Physicians were selected 
because they were believed to represent a population group which is 
homogeneous economically, with little occupational exposure to 
respiratory irritants, and with equal access to diagnostic facilities. 
Schairer and Schéniger, and Mills and Porter also obtained their lung 
cancer cases from death notices. Miiller, Schairer and Schéniger, 
Wassink, and Mills and Porter used samples of the general population 
as controls. These studies indicate that the reported association be- 
tween smoking and lung cancer is not peculiar to hospital populations. 

The sampling techniques used in these case history studies were 
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generally not sophisticated. It may be that the lung cancer cases stud- 
ied were not representative of all persons with lung cancer and that the 
controls were not representative of the general population, but these 
two requirements are not essential. To study the relationship between 
smoking and lung cancer it is sufficient that the lung cancer cases and 
controls be drawn from the same population. In two studies (Doll and 
Hill, and Levin) segments of the lung cancer series and of the controls 
were definitely drawn from the same population—patients with a 
presumptive diagnosis of lung cancer at interview. Subdivision of 
these series into lung cancer cases and non-cancer cases on the basis 
of final diagnosis yielded fewer non-smokers and more heavy smokers 
among the patients with lung cancer. 

Levin [18] has summarized various questions regarding possible 
sampling error as follows: 

“1. How likely is it that the control and lung cancer cases in the 
studies reported were drawn from two different populations 
which differed significantly with respect to smoking and differed 
always so that the control cases came from a lighter-smoking 
population than the lung cancer cases? 

“2. How likely is it that these sampling biases would be more 
marked for cigarette smokers than for pipe and cigar smokers 
and more marked for heavy cigarette smokers than for light 
cigarette smokers? 

“3. How likely is it that lung cancer cases who smoke are more apt 
to go to the hospital or that patients with other diseases who 
smoke are less apt to do so? 

“4. How likely is it that these sampling biases should occur re- 
peatedly in studies made in different countries and in different 
sections of the same country?” 

He concluded that, “None of these contingencies seems very likely, 
and their occurrence to so marked a degree as to show the difference 
between smokers and non-smokers elicited by the various studies is 
even less likely.” 

Although the association between smoking and lung cancer indi- 
cated by the studies discussed above is probably valid for the specific 
population groups studied—selected segments of white men in the 
United States and Northern Europe—it may be invalid to generalize 
to broad population groups, including women, Negroes, Asiatics, etc. 
However, considering that this association has been found in a variety 
of groups, it would be surprising if it were not found to be widespread. 
Several large scale studies are now under way of the incidence of lung 
cancer in populations whose smoking characteristics are known. 
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POPULATION STUDIES 


The conclusion from the results of the case history studies, that 
smokers are more likely to develop lung cancer than non-smokers, is 
supported by preliminary findings of two large scale population studies 
—one in the United States and one in England. These studies were 
undertaken to obtain a direct measure of the risk of developing lung 
cancer for various classes of smokers in specific population groups. 
Some investigators feel that case history studies fall short as a basis for 
generalization concerning the relationship between smoking and lung 
eancer, because it is not known what specific populations generate 
the lung cancer and control series. In addition, the case history studies 
reported to date have yielded a wide range of estimates of the relative 
risk of developing lung cancer for smokers as compared to non-smokers. 
By determining the incidence of lung cancer in a population whose 
smoking habits are known, the possible biases resulting from the study 
of populations of sick people can be avoided. Furthermore, by studying 
large population groups, fairly stable estimates of the lung cancer risk 
for various classes of smokers can be obtained. 

Beginning in January 1952, the American Cancer Society obtained 
smoking histories on approximately 200,000 men, 50 to 70 years of 
age, in nine different states. Volunteer workers each collected informa- 
tion from about 20 men of their acquaintance. Each volunteer reports 
annually on the survival status of the members of her group. If a man 
dies, a copy of the death certificate is obtained and in the case of a 
cancer death, additional information is secured from the appropriate 
hospital and physician. Hammond and Horn recently reported that in 
20 months, 4,854 men died, 167 of lung cancer [13]. The computed 
lung cancer death rate was 27 per 100,000 for non-smokers, 113 for 
men smoking less than a pack of cigarettes a day, and 239 for men 
smoking one pack or more a day.® Thus, on the basis of these prelimi- 
nary results, men smoking more than a pack of cigarettes a day are 
about nine times as likely to develop lung cancer as men who do not 
smoke. 

In October 1951, Doll and Hill sent questionnaires to all physicians 
on the Medical Register of the United Kingdom [9]. Out of 59,600 
doctors, 41,024 replied; 40,564 provided useable information. As doc- 
tors die, information on cause of death is obtained from the Registrars- 
General. The preliminary report is based on 789 deaths in 29 months 
among male physicians who were 35 years of age or over when they 
completed the questionnaire; 36 deaths were due to lung cancer. The 





‘These are not annual rates. 
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age-adjusted annual lung cancer death rates per 1,000 population were: 


Smoking class Rate 
Non-smokers 0.00 
1-14 cigarettes per day 0.48 
15-24 cigarettes per day 0.67 
25 or more cigarettes per day 1.14 


As indicated in the case history studies, the risk increases as the 
amount of smoking increases. 

A third large scale population study is being conducted by the 
National Cancer Institute of the United States Public Health Service 
in cooperation with the Veterans Administration. Early in 1954, ques- 
tionnaires requesting fairly detailed smoking histories were sent to 
291,000 veterans (mainly of World War I) holding U. S. Government 
Life Insurance policies. Replies have been received from about 220,000. 
As policies are paid, copies of the death certificates are obtained. No 
results are as yet available. 

The three studies discussed above might be criticized, because prob- 
ability sampling was not used. It might be contended that it will not be 
possible to use the findings as a basis for generalization to broad popula- 
tion groups. 

Whereas probability sampling is desirable, it is frequently impracti- 
cal. Furthermore, as indicated by Cochran, Mosteller, and Tukey [5], 
there are situationsin which sampling may not be “good policy.” “The 
inquirer may not be able to ‘afford’ the cost in time or money for a 
probability sample. ... The statement ‘he didn’t use a probability 
sample’ is thus not a criticism which should end further discussion and 
doom the inquiry to the cellar.’”” Smoking histories can be obtained 
from the members of a representative sample of the United States 
population. However, the cost of keeping the members of such a sample 
under observation for an extended period in order to determine the 
incidence of lung cancer would be prohibitive. One of the virtues of the 
three population studies now in progress is that the individuals under 
study can be “followed” at a relatively small cost. 

No one is likely to insist that no useful knowledge can be obtained 
in the absence of probability sampling. If a consistent relationship be- 
tween smoking and lung cancer is found in a variety of population 
groups, the hypothesis that smoking is associated with lung cancer 
will be greatly strengthened. The studies now under way will make it 
possible to test the validity of this hypothesis for different geographic 
areas, rural as well as urban populations, and different occupational 
groups. However, since none of these studies include a large number 
of women, an investigation of the incidence of lung cancer in various 
classes of smokers in a female population seems to be called for. 
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DISCUSSION 


The demonstration that lung cancer occurs more frequently in smok- 
ers and still more frequently in heavy smokers does not and cannot 
prove that smoking per se causes cancer of the lung. In studies of hu- 
man populations, the individual’s past, current, and future smoking 
practices must be taken as found; they cannot be assigned in advance. 
Whether or not an individual smokes is largely a matter of personal 
choice. That choice may be bound up with other factors—social, 
occupational, economic, and biologic—which may be related to pre- 
disposition to lung cancer. However, if a substantial reduction in 
smoking were followed, some years later, by a marked decrease in the 
incidence of lung cancer, this might be considered very strong support- 
ing evidence that smoking “causes” lung cancer. 

Some investigators believe that the available evidence is sufficiently 
strong to warrant the presumption that the relationship between 
smoking and lung cancer is causal. Others want to wait for the final 
results of the population studies now under way. Still others contend 
that the available evidence is far from conclusive. Some are dissatisfied 
with the nature of the samples in both the case history and population 
studies in that the members of the various smoking classes are self- 
selected. Attempts are presently being made to investigate the biologic 
and personality characteristics of heavy smokers, light smokers, and 
non-smokers. The adherents to the industrial exposure—air pollution 
hypothesis believe that among the various possible agents, tobacco 
smoke is only a minor factor in the rapid increase in lung cancer mor- 
tality. Their position is based in large part on the experimental evidence 
in animals regarding the carcinogenicity of certain industrial products, 
and the general lack of success to date in producing cancer in animals 
with tobacco smoke or its condensates. 

Since lung cancer occurs among non-smokers, it is evident that if 
smoking is a cause it is not the only cause. Furthermore, it is difficult 
to explain observed variation in reported lung cancer mortality in 
different population groups entirely on the basis of variation in smoking 
habits. For example, the lung cancer mortality rate is about twice as 
high in England as in the United States; it is roughly twice as high in 
urban as in rural areas; and it varies over a wide range among urban 
areas in the United States. It seems likely that part of the variation 
may be due to differences in the quality of diagnosis and in the ac- 
curacy of reported causes of death. Some of the variation may be 
caused by differences in air pollution and industrial exposure. How 
much of the variation may be attributed to smoking in itself or to the 
co-carcinogenic action of smoking, air pollution, and industrial ex- 
posure is not known. Available evidence concerning the association 
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between smoking and lung cancer is incomplete and additional infor- 
mation is needed for a proper evaluation. 

Mortality from lung cancer is much higher in males than in females 
and the difference has been increasing. In the United States in 1930, 
the ratio of the lung cancer mortality rate among white males to the 
rate among white females was 1.7. By 1950, this ratio had increased to 
4.6. Although accurate information on the smoking habits of the Ameri- 
can public is not available, it is generally believed that smoking among 
women was rather rare at the beginning of the century and did not 
become a common practice until some twenty years ago. Since lung 
cancer is believed to have a long latent period, even if smoking is a 
cause of lung cancer, the effect of this change in the smoking habits of 
women may not become apparent in mortality statistics for another 
decade or two. Since it has been shown that smoking is more common 
among women with lung cancer than among comparable controls 
(Doll and Hill [10]), a marked acceleration in the increase in the lung 
cancer mortality rate for women during the next one or two decades 
would lend additional support to the hypothesis that smoking is a 
cause of lung cancer. 

The population studies described above will also provide additional 
evidence. If lung cancer should consistently occur more frequently in 
smokers than in non-smokers, in various sub-groups of the populations 
studied, the case for a cause-effect relationship will be greatly strength- 
ened. To refute effectively the hypothesis that smoking is a cause of 
lung cancer would then require a reasonable explanation, other than 
causation, for the consistently observed association between smoking 
and lung cancer. 

Laboratory investigations are under way to determine whether a 
carcinogenic substance can be identified in tobacco and whether lung 
cancer can be induced in animals with tobacco smoke or condensates 
of tobacco smoke. Efforts to induce lung cancer in animals by inhalation 
have been unsuccessful to date. However, Wynder, Graham, and 
Croninger [29] recently reported that 44 per cent of 81 mice painted 
with tobacco tar developed true carcinoma of the skin. The production 
of any form of cancer in laboratory animals with tobacco products may 
lead to identification of the carcinogenic fraction. 

Successful production of lung cancer in animals with tobacco would 
tend to support the hypothesis that smoking causes lung cancer in 
humans. However, success in producing cancer in an animal does not 
prove that the same substance is carcinogenic to man. Conversely, 
failure to produce cancer in an animal with a specified carcinogen does 
not prove that the substance is not carcinogenic to man. It is of interest, 
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that although it is generally accepted that the chromate industry in- 
volves a marked lung cancer hazard, Levin [18] reports that “applica- 
tion or injection into animals of various materials to which workers 
are exposed has thus far failed to produce malignant lung tumors.” 


SUMMARY 


The available evidence on the relationship between smoking and 

lung cancer is of four kinds: 

1. The observed concomitant increase in recorded mortality from 
lung cancer and consumption of cigarettes; 

2. Fourteen case history studies indicating a smaller percentage of 
non-smokers and a higher percentage of heavy smokers among 
lung cancer patients than among comparable controls; 

3. Preliminary results of two population studies indicating a higher 
incidence of lung cancer in smokers than in non-smokers, snd a 
still higher incidence in heavy smokers; and 

4. The successful production, by at least one team of investigators, 
of skin cancer in animals with condensates of tobacco smoke. 

There is disagreement whether the evidence at hand warrants a con- 
clusion that smoking and lung cancer are causally related. As addi- 
tional evidence is gathered from observation of human populations 
and from experimentation with animals, conclusions will be reached 
which should achieve general acceptance. 
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NONWHITE POPULATION INCREASES 
IN METROPOLITAN AREAS* 


Paut F. Cor 
Federal Housing Administration 


HIs analysis is devoted primarily to nonwhite population changes 
Thich have occurred in standard metropolitan areas (SMA’s) from 
1940 to 1950. However, a helpful backdrop against which to evaluate 
the recent SMA nonwhite changes is the long-term growth pattern of 
nonwhites in the United States total. The need for this orientation is 
the greater, since the recent trend runs noticeably counter to that 
which obtained for over a century. 


NONWHITE POPULATION TRENDS IN THE UNITED STATES 


The nonwhite! population in the Nation increased by 17 per cent 
over the last decade to almost 16,000,000 or over 10 per cent of the total 
of all races, as is shown in Table 1. Thus, for the second successive dec- 
ade the relative increase in nonwhites exceeded that of whites, and at 
a conspicuously accelerated rate. This marks a departure from the 
long-standing trend toward a declining proportion of nonwhites which 
began in 1810 and continued without interruption up to 1930. These 
recent relative nonwhite population gains are attributable, on the one 
hand, to a higher rate of natural increase in the nonwhite than in the 
white population.? Both the birth and the death rates, the two compo- 
nents of natural increase, have improved materially in recent decades. 
The much greater relative improvement in the nonwhite death rate 
over the decades, however, accounts for an important part of the recent 
relative increase in nonwhites. 

On the other hand, the reduction in white immigration explains a 
significant part of the recent relative increase in the nonwhite popula- 
tion. Although during the past two decades immigration has been a 
relatively unimportant factor in our national growth, amounting to 
about 5 per cent of our total population increase, from 1830 to 1930 it 
accounted for about one-third of the total population increase and ex- 
ceeded one-half from 1900 to 1910.* Certainly after 1860 the bulk of 

* Acknowledgement is made of Austin R. Speake’s valuable contribution in assisting in the prepara- 
tion of many of the statistics which underlie this analysis. 

1 The term “nonwhite” consists of Negroes, Indians, Jap , Chi , Filipinos, Koreans, Asiatic 
Indians, Polynesians, and other Asiatics. Persons of Mexican birth or ancestry who are not definitely 
Indian or of other nonwhite race were classified as white. 

2 U. S. Bureau of the Census, Statistical Abstract of the United States, 1953, Tables 56 and 67. 


3 U. 8. Immigration and Naturalization Service, 1952 Annual Report, Table 1, and Table 1 of this 
paper. 
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TABLE 1 
TREND OF NONWHITE POPULATION, 1790-1950, AND TREND OF 





NONWHITE URBAN AND RURAL POPULATION, 1930-1950 








Absolute number 





Per cent increase 











Nonwhite over decade 
Year and 
‘ asa % of 
residence all races Non- 
Nonwhite White All races white White All 
U. 8S. total 
1950 15,755,333 | 134,942,028 | 150,697,361 10.5 17.1 14.1 14.5 
1940 13 454,405 | 118,214,870 | 131,669,275 10.2 I 72 73 
1930 12,488,306 | 110,286,740 | 122,775,046 10.2 14.7 16.3 16.1 
1920 10,889,705 94,820,915 | 105,710,620 10.3 6.3 16.0 14.9 
1910 10,240,309 81,731 ,957 91 ,972 ,266 11.1 11.5 22.3 21.0 
1900 9,185,379 66 ,809 , 196 75,994 ,575 12.1 17.1 21.2 20.7 
1890 7,846 ,456 55,101,258 62 ,947 ,714 12.5 16.2 27.0 25.5 
1880 6,752 ,813 43 ,402 ,970 50,155,783 13.5 23.2 26.4 26.0 
1870* 5,481,157 | 84,337,292 | 39,818,449 13.8 21.2 27.5 26.6 
1860 4,520,784 26 ,922 ,537 31,443 ,321 14.4 24.2 37.7 35.6 
1850 3 ,638 ,808 19 ,553 ,068 23,191,876 15.7 26.6 37.7 35.9 
“1840 2,873 ,648 14,195,805 17 ,069 , 453 16.8 23.4 34.7 32.7 
1830 2,328 ,642 10 ,537 ,378 12,866 ,020 18.1 31.4 33.9 33.5 
1820 1,771,656 7 ,866 ,797 9,638 ,453 18.4 28.6 34.2 83.1 
1810 1,377 ,808 5,862,073 7,289,881 19.0 37.5 36.1 36.4 
1800 1,002 ,037 4,306 ,446 5,308 ,483 18.9 82.3 35.8 35.1 
1790 757 ,208 3,172,006 3,929,214 19.3 _ _— _ 
Total nonfarm (Urban and rural nonfarm) 
1950 12,422 ,237 | 115,226,774 | 127,649,011 9.7 42.8 24.2 25.8 
1950T 12,418,743 | 115,202,079 | 127,620,822 9.7 42.7 24.2 25.8 
1940 8,701 ,679 92,751,408 | 101,453 ,087 8.6 15.1 9.0 9.5 
1930 7 ,557 ,038 85 ,060 ,495 92 ,617 ,533 8.2 _ _— _- 
Urban 
1950 9,711,251 86 ,756 ,435 96 ,467 ,686 10.1 50.5 27.6 29.6 
1950T 9,259 ,600 79 ,667 ,864 88 ,927 ,464 10.4 43.5 17.2 19.5 
1940 6 ,450 ,879 67 ,972 ,823 74,423 ,702 8.7 19.6 6.9 7.9 
1930 5,394,790 | 63,560,033 | 68,954,823 7.8 -- — _ 
Rural nonfarm 
19506 2,710,986 | 28,470,339 | 31,181,325 8.7 20.4 14.9 15.4 
1950T 3,159,143 | 35,534,215 | 38,€93,358 8.2 40.4 43.4 43.2 
1940 2,250,800 | 24,778,585 | 27,029,385 8.3 4.1 15.2 14.2 
1930 2,162,248 21,500,462 23 ,662,710 9.1 _ _ _ 
Rural farm 
1950 3,333 ,096 19,715,254 23 ,048 ,350 14.5 —29.9 |-—22.6 | —23.7 
1950T 3 ,336 ,590 19 ,739 ,949 23 ,076 ,539 14.5 —29.8 |—22.5 | —23.6 
1940 4,752,726 25 ,463 ,462 30 ,216 ,188 15.7 — 3.6 9 2 
1930 4,931,268 25 ,226 ,245 30,157 ,513 16.4 _ _ _ 


























* Adjusted for underenumeration as estimated in the 1930 Census of Population, Vol.II, Chapter 2, 


Table 4. 


¢ Old urban definition. 


Source: U. S. Bureau of the Census, 1950 Census of Population Report P-Al, Table 2: P-B1, Table 


34; and Historical Statistics of the United States, 1789-1945, Series B 13-23. 
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these immigrants were white and therefore augmented the white in- 
crease as compared with the nonwhite.‘ This tended to obscure the 
strong underlying element of natural increase present among nonwhites 
for many decades. Indeed, had the immigration aspect of the popula- 
tion increase been eliminated from the white and nonwhite trends of 
the past century and a half, the nonwhite relative increase probably 
would have fairly closely approximated that of the white during the 
earlier period. 

For some of the past decades, the nonwhite population counts ap- 
parently reflect a sizeable absolute amount of underenumeration.® Al- 
though the 1870 nonwhite figures shown in Table 1 have been adjusted 
upward to reflect Census Bureau estimates of that underenumeration, 
the adjusted fluctuations for that date are still inexplicably erratic, as 
are those for 1890 and 1920, which also appear too low. Those fluctua- 
tions are not substantial enough, however, to affect materially the 
trends and general conclusions pointed out. Moreover, the tendency 
to undercount nonwhites, especially children under five years of age, 
is a continuing bias which therefore has little influence on the relative 
rate of change among nonwhites over the decades.® 


SHIFTS TO NONFARM AREAS 


Not all types of areas shared equally in the general upsurge of non- 
white population. In fact, in the redistribution of the nonwhite popu- 
lation from 1940 to 1950, those living in rural farm areas declined by al- 
most 30 per cent or by 1,400,000, while the white population in those 
areas declined by only 22 per cent or by about 5,700,000,’ as shown in 
Table 1. Nonwhites living in urban areas (using old urban definition 
for comparability) and in rural nonfarm areas, on the other hand, in- 
creased sharply and at somewhat similar rates, 44 and 40 per cent. 
However, in absolute terms, nonwhites in urban areas gained about 





4 Gunnar Myrdal, An American Dilemma (New York) Harper: (1944), 119; Henry C. Carey, The Slave 
Trade (1853), p. 18; U. S. Bureau of the Census, Historical Statistics of the United States, 1789--1945, 
Tables B 304-330; and Helen F. Eckerson and Gertrude D. Krichefsky, “A Quarter Century of Quota 
Restriction,” Monthly Review of Immigration and Naturalization Service, January 1950, p. 91. 

'U. S. Bureau of the Census, 1930 Census of Population, Volume II, General Report, p. 26 and 
Negro Population in the United States, 1790-1915, Chapter II. 

*U. 8S. Bureau of the Census, 1950 Census of Population, Volume I, Number of Inhabitants, p. xiii. 
See also footnote 5. 

? According to the 1950 Census of Population, Volume II, Part 1, pp. 33-35, the 1950 definition of 
farm population differs from that of 1940 and 1930 in that in 1950 persons living on what might have 
been considered farm land were classified as nonfarm if they paid cash rent for their homes and yards, 
as also were persons in institutions, summer camps, motels, and tourist camps. There is evidence that 
the farm population in 1950 would have been about 9 per cent larger had the 1940 classification been 
used. By appropriately augmenting the 1950 farm figures in order to arrive at an estimate of the number 
of nonwhites and whites living in rural farm areas on a basis comparable with the 1940 definition, the 
nonwhite decline would be 24 per cent, or 1,100,000; the comparable percentage decline in white persons 
living on farms is estirnated to be almost 16 per cent, or about 4,000,000. 
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2,800,000 compared with 900,000 for the rural nonfarm classification, : 


The relative increase in the white population in the rural nonfarm areas § 


was at about the same rate as was that of the nonwhites; but in urban} 
places, the white increase was considerably less than half as great rela. 


tively, 17 per cent, as was the nonwhite. 

As a result of these population shifts together with the differing 
rates of natural increase, over 12,400,000 nonwhites lived in nonfarm 
areas in 1950, which amounts to almost 79 per cent of the total non- 
whites in the United States as compared with 65 per cent in 1940 and 


61 per cent in 1930. The white population in nonfarm areas, still rela. F 


tively greater than nonwhite, increased to over 85 per cent of the 
United States total count of whites in 1950, as compared with 78 per 
cent in 1940 and 77 per cent in 1930. Percentage-wise, therefore, the 
nonwhites in nonfarm areas have been gaining rapidly over the past 
two decades as compared with the white population. Using the new 
urban definition,* however, nonwhites have increased much faster 
relatively than they have in the rural nonfarm areas. These trends 
clearly indicate the sharp movement of the nonwhite population away 
from farms and especially to urban areas which are the industrial 
centers of the Nation. 


NONWHITE SHIFTS TO SMA’s 


Turning now to population changes in SMA’s, which is the main 
theme of this paper, the rapid surge of nonwhite population into SMA’: 
during the 1940’s is one of the most significant and dramatic popula- 
tion trends revealed by the last census. It is appropriate, therefore, to 
examine the attributes of the SMA, in order to understand better the 
type of locality to which the nonwhites are attracted so strongly. Asa 
class, SMA’s possess the highest degree of urbanization of all types of 
areas. They are the Nation’s big cities and their environs. Formally, 
the U. 8. Bureau of the Budget has defined an SMA as a county or 
group of contiguous, socially and economically integrated counties, 
which contains at least one city of 50,000 or more inhabitants, except 
in New England where somewhat different criteria are employed.® 





8 According to the 1950 Census of Population, Volume II, Part 1, Chapter B, p. VI, the 1950 defini- 
tion of urban population was expanded to include as urban, persons living (a) in the densely settled 
urban fringe, both incorporated and unincorporated, surrrounding cities of 50,000 population and (b) 
in other unincorporated places of 2,500 or more. This change in definition resulted in an increase in the 
urban and a roughly compensating decrease in the rural nonfarm population count of nonwhites of about 
450,000 and of whites of almost 7,100,000 in 1950. Thus, the non-white totals were affected less, both 
relatively and absolutely than were the white by the change in definition. 

* In addition to the county, or counties, containing a central city, or cities, of 50,000 population, 
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In order to highlight the trend of nonwhites to SMA’s, a table show- 
ing pertinent statistics of nonwhite population changes in each of the 
168 SMA’s has been prepared by the Market Analysis Section, Divi- 
sion of Research and Statistics, FHA. The summary data which that 
table presents for the United States, as well as the separate SMA data 
for the Washington, D. C., area, are reproduced in Table 2. The Wash- 
ington, D. C., data in this summary table serve to illustrate the type 
of statistics contained in a detailed table embracing each of the 168 
SMA’s in the continental United States in 1950, which table is avail- 
able on request from the FHA.'° Because of space limitations, it is not 
feasible to publish here the statistics for each of the SMA’s. Moreover, 
while the detailed table includes data for 1930, this analysis is focused 
largely on the changes from 1940 to 1950. 

Of the 15,755,000 nonwhite" persons living in the continental 


United States in 1950, over 8,250,000 lived in SMA’s—an SMA in- 
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crease of 2,534,000 from 1940. An indication of the magnitude of that 
SMA nonwhite increase can be had by noting that it exceeded the 1950 
all race population count of Philadelphia by almost one-half a million. 
Moreover, it also substantially exceeded the aggregate 1950 count of 
all persons living in Montana, Idaho, Wyoming, Utah, and Nevada. 

Nonwhites in SMA’s increased over twice as fast relatively as did the 
white population, 44 per cent compared with 20 per cent. Moreover, 
nonwhites in SMA’s increased over 23 times as fast percentage-wise as 
they did in all types of areas in the United States. As a result of these 
changes, approximately 1 of every 10 persons living in SMA’s in 19506 
was nonwhite. From 1930 to 1940 also the nonwhites increased rela- 
tively much faster in SMA’s than in the country at large, 16 per cent 
compared with 8 per cent, respectively. 

It is estimated that migration accounted for almost two-thirds of 





contiguous counties are included in an SMA if according to specified criteria they are essentially metro- 
politan in character and are socially and economically integrated with the central city. The criteria of 
metropolitan nature relate primarily to the character of the county as a place of work or as a home for 
concentrations of nonagricultural workers and their dependents. Since in New England the towns and 
cities are more important units administratively than is the county, they are the units in terms of which 
those SMA’s are delineated. In New England, the criterion of a minimum population density of 150 
persons per square mile also applies in most instances. The definition of an SMA is given more fully in 
the U. S. Bureau of the Census, 1950 Census of Population, Number of Inhabitants, U. S. Summary, 
Report P-Al, p. XX XI; for a precise delineation of each SMA, see Tables 26 and 27 of the U. 8. Bureau 
of the Census, Report P-A1. 

10 Single copies of the complete table containing data for each of the 168 standard metropolitan 
areas may be obtained without charge, while the supply lasts, by writing to the Division of Research 
and Statistics, Federal Housing Administration, Washington, D. C. 

u Of the nonwhite total, 15,042,286, or 95.5 per cent, were Negroes. The greater number of the 
remaining 713,047 nonwhites were American Indians, with Japanese and Chinese next most numerous 
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INE 1955 
g5° the nonwhite population increase in SMA’s" and in their central cities, 
<23 and for almost half of the nonwhite increase in that part of the SMA’s 
838 lying outside the central cities (referred to hereafter as suburbs, even 
a3 though some of this area is open country). This may be inferred from 
sas the fact that the nonwhite natural increase in the United States total 
a from 1940 to 1950 amounted to 17.1 per cent, whereas the nonwhite 
Hf population increase was 44.3 per cent in SMA’s, 48.3 per cent in their 
as” central cities, and 32 per cent in the suburbs. A rough estimate of the 
‘Ra pei cent ofssonwhite in-migration can be computed indirectly by 


algebraicalfy"subtracting from each of the three foregoing percentage 
increases, the 17.1 per cent rate of nonwhite natural increase in the 
U.S. total and by then dividing that percentage difference by the total 
per cent increase in nonwhites in each of the three classifications. Ac- 


pulation Trend: 
the Census, 
Bl, Table 39 


of 








i) of 
[4 » . . . . . . 
gf 3 cordingly, in-migration from 1940 to 1950 is estimated to be about 61 
ee per cent of the nonwhite population increase in the 168 SMA’s, about 
zsh 65 per cent of all central cities, and about 47 per cent in the suburbs of 
gil all SMA’s. It is obvious that the source of the migration was the non- 
Beh white population living outside SMA’s, inasmuch as that segment of the 
Hee nonwhite population of the United States actually declined by 233,000, 
BS 22 . . 
Sa° or 3 per cent, in contrast to the 17.1 per cent natural increase noted for 
= 3<8 the total nonwhite populatio#z The nonmetropolitan segment remains 
feu De > P rat, ‘ ° ° ° 
2 Et an important potential source of further nonwhite migration, inasmuch 
gee as 7,505,000 nonwhites, or 48 per cent of the United States nonwhite 
3 as total, still lived outside SMA’s in 1950. 
op CON ° . . ey 
- Bed a In the Washington, D. C. SMA, the locality for which exhibit data 
oo" . . . . . 
nto ee in Table 2, nonwhite in-migration percentages! somewhat 
oid -.u3eded those noted in the foregoing paragraph for the total of all 
3256 168 SMA’s in the U. 8S. Thus, it is estimated that about 65 per cent of 
Mig. ’ 
Sg 0 the total nonwhite decennial increase in the Washington SMA sprang 
s- 53 . ° . ° ° . 
E a2? from in-migration, as did about 66 per cent in the city proper, and about 
gaat 
3344 12 Nonwhite birth and death statistics are reported annually for most SMA counties and central 
a8 Be cities in the source given in footnote 13. By assembling the appropriate county natural increase figures 
55 3 . for the years 1940 through 1949 and by then subtracting them from the population increase over the 
5 ©. decade (given in the table referred to in footnote 10) the absolute and relative quantity of nonwhite 
ae oe in-migration can be computed fairly accurately for any SMA. In contrast, the crude but very con- 
: aes venient method of estimating nonwhite and white in-migration described in this paper yields only ap- 
+ j “s@}ximate results for the reason that the rate of net natural increase inside SMA’s is less than that of 
aSi the population living outside SMA’s. Moreover, the rate of net natural increase varies noticeably among 
ga NM the 168 SMA’s. Consequently, the validity of this indirect method of estimating in-migration may vary 
3 Be substantially among the SMA’s. It is likely that this method will yield useful and fairly accurate 
her ? estimates of in-migration for SMA’s which experienced a high rate of population increase from 1940 to 
7 a 1950. Conversely, for SMA’s which fell substantially below the average rate of population increase of 
Fi the Nation, this device may very likely produce untrustworthy estimates of migration. Note that the 
Od 5 estimate of nonwhite in-migration computed by this method for the Washington, D. C. SMA closely 
ARS approximated that developed from natural increase statistics. 
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54 per cent in the metropolitan fringe. Corresponding figures for the 
white population indicate that about 73 per cent of the total white 
increase in this SMA in-migrated, as did about 89 per cent in the SMA 
fringe, whereas about 53 per cent of the net white natural increase in 
the city proper out-migrated—in contrast to the 66 per cent in-migra- 
tion noted for nonwhites in the city. In some of the other specific SMA’s 
the importance of migration varied widely from that of the Washing- 
ton, D. C. SMA, and from the aggregate total of all 168 SMA’s. Decen- 
nial natural increase statistics by color, required in the exact deriva- 
tion of in-migration estimates, have not been developed here for any 
SMA other than Washington, D. C. However, a rough approximation 
of the percentage increase ascribable to in-migration can be computed 
inferentially for any other SMA by using the method discussed in the 
previous paragraph. (Individual SMA percentage increases are avail- 
able from the source indicated in footnote 10.) Incidentally, in the 
Washington, D. C. SMA the actual rate of natural increase in non- 
whites over the decade was computed to be 15.7 per cent" before up- 
ward adjustment for underregistration, quite close to the 17.1 per cent 
natural increase noted for the U. S. total. 


INCREASES INSIDE CENTRAL CITIES 


Nonwhite persons have been gravitating rapidly to the large centers 
of population, and more particularly to the central cities of those areas. 
Thus, the nonwhites inside the central cities of the 168 SMA’s in- 
creased by 2,088,000, compared with 446,000 in the SMA suburbs. 
The movement further concentrated the nonwhites in the congested 
areas of the cities, so that in 1950 there were 6,411,000 nonwhites inside 
central cities and only 1,839,000 in the suburbs. While nonwhites in- 
side all central cities increased by 48 per cent, the comparable white 
population increased by only 10 per cent. As the SMA’s grew, the popu- 
lation overflowed the central city boundaries and into the suburbs. 
Thus, in the suburbs, the nonwhites increased by 32 per cent and the 
whites by 36 per cent, as is shown in Table 2. 

As a result of this redistribution, over half of the nonwhite popula- 
tion in the United States, 52 per cent, resided in SMA’s in 1950, com- 
pared with 42 per cent in 1940. Over the 10-year period, the proportion 
of the total white population living in SMA’s grew to 57 per cent in 
1950 from 54 per cent in 1940. 

These recent nonwhite shifts were under way a decade earlier, for 





18 Natural increase figures employed in these computations were compiled from the U. 8. National 
Office of Vital Statistics, Vital Statistics of the U. S,, Part II, Place of Residence, annual volumes, 1940 
through 1949, 
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from 1930 to 1940 the nonwhites inside central cities increased by 19 
per cent whereas their increase in the SMA suburbs was only 8 per 
cent. 

The further concentration of nonwhites in central cities is not unlike 
the experience of earlier large-scale migrations, especially of white 
immigrants during the second half of the 19th century. Much of the 
housing occupied by in-migrant nonwhites was formerly occupied by 
the white population living in neighborhoods adjacent to established 
nonwhite neighborhoods. This is partly because of an apparent general 
but not universal preference on the part of nonwhites for close-in loca- 
tions with their available church, social, and entertainment facilities, 
rent differentials, proximity to sources of employment, and transporta- 
tion advantages; it is also partly because of a lack of available new 
construction for nonwhites in the outlying areas. In turn, the white 
households which had occupied housing in transition to nonwhite oc- 
cupancy relocate to a significant degree in the outskirts of the city 
proper or in the suburbs, as is demonstrated by figures given in subse- 
quent paragraphs. 


EXPANDING CENTRAL CITY BOUNDARIES 


It is likely that some of these decennial changes as between inside 
central cities and their suburbs were apparent rather than real, being 


partly attributable to the expansion of the boundaries of many central 
cities with a compensating contraction of that part of the SMA lying 
outside the central cities. This follows from the fact that the land area 
encompassed by these 193 central cities (contained in the 168 SMA’s) 
increased from 5,720 square miles in 1940 to 6,573 square miles in 
1950, or by 15 per cent over the decade. In 11 of these 193 central 
cities, most of which were in the West or in the South, there were an- 





4 U. 8. Bureau of the Census, 1940 Census of Population, Volume I, Number of Inhabitants, Table 
17, and Areas of the United States, Table 4; 1950 Census of Population, Land Area and Population of 
Incorporated Places of 2,500 or More, Series GEO. No. 5. A compilation from these sources yields the fol- 
lowing distribution of change in number of square miles from 1940 to 1950— 


Change in number Number of 

of square miles central cities 
All cities 198 
Decrease 6 
No change 69 
Increase 118 
0.1 to 0.9 30 
1.0to 1.9 22 
2.0to 4.9 28 
5.0to 9.9 13 
10.0 to 19.9 1é 
20.0 to 49.9 8 
50.0 to 89.9 3 
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nexations ranging from 20 to 87 square miles. However, 121 central 
cities underwent no change or increased by less than 2 square miles, 
Generally, the suburban areas contained relatively fewer nonwhites 
than did the central cities to which these areas were annexed. Inas- 
much as in 1940 the population density of the central cities was 15 
times as great as that of the suburbs, the 15 per cent increase in area 
of central cities undoubtedly accounts for a much smaller, but un- 
known, per cent increase in the population of these central cities. 

It should be emphasized, of course, that the suburban areas into 
which the corporate limits of the central cities overflow, tend to exhibit 
the same urban characteristics as do the adjacent areas of the central 
cities themselves. Therefore, usually the formal act of incorporation of 
a suburban segment simply recognizes legally and administratively at 
irregular intervals that condition of urban development which pro- 
ceeds at a more regular pace and according to more orderly socio-eco- 
nomic laws. Although annexations enlarged the area of 118 of the 193 
central cities, the over-all boundaries of the SMA’s themselves were 
held constant from 1930 to 1950 in the statistical comparisons pre- 
sented in this analysis. 


CENTRAL CITIES LOSING WHITE AND GAINING NONWHITE POPULATION 


In 22 central cities of SMA’s having an all race population of close 
to 9,000,000 in 1950, the nonwhite population increased by about 446, 
000 or 58 per cent, while the white population decreased by about 
142,000, or almost 2 per cent from 1940 to 1950, as is shown in Table 3. 
These increase-decrease relationships, however, varied widely. 

In some cities these shifts were quite substantial. The city of Chi- 
cago, for example, lost 3,000 white persons but gained 227,000 non- 
white. Comparable figures for St. Louis are 4,000 white lost, com- 
pared with 45,000 nonwhite gained; Cleveland, 28,000 white lost, 
65,000 nonwhite gained; Pittsburgh, 15,000 white Jost, 21,000 non- 
white gained; Newark, 20,000 white lost, 29,000 nonwhite gained; and 
Buffalo, 15,000 white lost, 19,000 nonwhite gained. 

In the suburbs surrounding each of these 6 cities the numerical in- 
crease in white population was at least 8 times as great as the nonwhite 
increase. 

In six of these 22 central cities, the white population decreased by 
over 5.0 per cent, with a decline of 7.3 per cent in Atlantic City. In ell 
but three of the 22 central cities the nonwhite population increased 
much faster relatively than the white lost. In only 7 of these 22 cities 
was the all race population less than 100,000. 
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TABLE 3 


THE 22 SMA CENTRAL CITIES IN WHICH NONWHITES INCREASED 
AND WHITES DECREASED, RANKED BY WHITE 
PER CENT DECREASE, 1940-1950 








White | Nonwhite Por —* Nonwhite 


decrease increase asa % of 
1940-50 1940-50 White Nonwhite all races 


All races 


Central cities 1950 








8 


to or or me bo 


Atlantic City, N. J. 61 ,657 —3 ,552 -7.3 
Hazleton, Pa. 35,491 —2,536 —6.7 
Johnstown, Pa. 63 ,232 —4,079 -6.3 
Steubenville, Ohio 35 ,872 —2,011 —-5.8 
Newark, N. J. 438 ,776 —20,385 


2vaoom 
~ 
I 00 6 


- 
a 


Wilmington, Del. 110,356 —5,096 
Lawrence, Mass. 80 ,536 —3,898 
Lowell, Mass. 97 ,249 —4,212 
Youngstown, Ohio 168 ,330 —6 ,273 
Wheeling, W. Va. 58,891 —2,303 


- 


at 
morvwNn 


os aoe 


oy 


Cleveland, Ohio 914,808 —28,153 
Jersey City, N. J. 299 ,017 —9 ,547 7,391 
Providence, R. I. 248,674 —6,911 2,081 
Buffalo, N. Y. 580 ,132 —15,186 19,417 
Pittsburgh, Pa. 676 ,806 —15,411 20 ,558 


BS2T3 S838 B038~ 


obarme 
Rows 
womans 


— 


Reading, Pa. 109 ,320 —2,262 1,014 
Trenton, N. J. 128 ,009 —1,880 5,192 
St. Louis, Mo. 856 ,796 —4,446 45,194 
Nashville, Tenn. 174,307 —491 7,396 
Chicago, Ill. 3,620,962 —3,039 227 ,193 


—- OF to 
“Pr Ors 


1 
1 
3 
1 


Utica, N. Y. 101,531 ~—128 1,141 
Chattanooga, Tenn. 131,041 —22 2,900 





Total 8,991,793 —141,821 445 ,620 























* Less than 0.05 per cent. 


Nonwhites in these central cities were present in about the same 
proportion, 13.5 per cent, as they were in the central cities of all 168 
SMA’s, 13.0 per cent. Yet in Nashville, Chattanooga, and Atlantic 
City they were more than twice as numerous relatively. The majority 
of these 22 cities are located in the Northeastern Region of the Nation. 
Only 3 are in the South. 


REGIONAL SHIFTS 


Although marked shifts in the nonwhite population occurred among 
the four regions of the United States during the decade, the South still 
had, in 1950, by far the largest number of nonwhites living in SMA’s, 
3,577,000. Moreover, nonwhites were over twice as numerous propor- 
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tionately in southern SMA’s as in the other regions, as is shown in Table 
4. 

The North Central SMA’s experienced the greatest absolute increase 
in nonwhite population, 810,000 since 1940. That increase brought the 
nonwhites to 2,022,000 in 1950, or over 8 per cent of all SMA popula- 
tion in that region. It is important to note, however, that war and post- 


TABLE 4 


REGIONAL TRENDS IN THE NONWHITE POPULATION 
IN THE 168 SMA’S, 1940-1950 








Population in SMA’s of each region 





Total 


Year and color 


168 SMA’s 


North- 
east 


North 
Central 


South 


West 





All races, 1950 
Nonwhite, 1950 
Population increase 
1940-1950: 
White 
Nonwhite 


84,500 ,680 
8,250,210 


12 ,690 ,526 
2,533 ,673 


30,891,820 
1,953 ,399 


2,210,306 
646 ,298 


24,491 ,036 
2,021,691 


2,902 ,505 
809 ,836 


17,200 ,809 
3,577 ,208 


3,892,188 
686 ,315 


11,917,015 
697 ,912 


3 ,685 , 527 
391,224 


Per cent increase 
1940-1950: 
Total 
White 
Nonwhite 
Nonwhite as per cent of 
total: 
1950 
1940 
Number of SMA’s 
1950 


36.3 
40.0 























war employment opportunities resulted in very large absolute increases 
in the number of nonwhites living in the SMA’s of each region. 

The average rate of nonwhite increase in SMA’s was greatest in the 
West, 128 per cent. There were, however, substantial relative nonwhite 
SMA increases in the North Central Region, 67 per cent, and in the 
Northeast, 49 per cent. The white population in the SMA’s of each of 
these three regions, in contrast, increased at a much lower rate than 
did the nonwhite. Yet it is quite significant that in the SMA’s of the 
South the white population increased almost twice as fast relatively as 
did the nonwhite, 40 per cent compared with 24 per cent. 

The observed tendency of the nonwhite population to increase rela- 
tively faster than the white was quite general among the SMA’s in 
each region outside the South. It occurred in 34 of the 39 SMA’s in the 
Northeast, in 14 of the 18 SMA’s of the West, and in 46 of the 53 
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SMA’s of the North Central Region. Contrary-wise, in 51 of the 58 
SMA’s in the South, the nonwhite population increased at a slower 
rate than did the white. 


HIGHEST PERCENTAGE NONWHITE 


Each of the 10 SMA’s with 50,000 or more nonwhites in 1950 and 
with nonwhites comprising at least one-third of the total population 
was located in the South, as is shown in Table 5. The percentage of 
nonwhite to total population, however, declined over the decade in 
each of these 10 SMA’s. The relative increase in number of white per- 
sons was much greater than that of nonwhites in each of these SMA’s. 
Nevertheless, it is important that, despite the more rapid relative non- 
white SMA gains in other parts of the country, there was an increase 
of over 10 per cent in 8 of these 10 southern SMA’s. In the Baton Rouge 
and Mobile SMA’s, nonwhites increased by over 50 per cent during 
the decade. 


TABLE 5 


THE 10 SMA’S WITH MORE THAN 50,000 NONWHITES IN WHICH 
NONWHITES COMPRISED MORE THAN 33 PER CENT OF THE ALL 
RACE TOTAL, RANKED BY PER CENT NONWHITE, 1950 








Per cent increase | Nonwhite popu- 
Nonwhite 1940-50 lation asa % 
population of all races 
— 1950 ' Non- 
White | white | 1950 | 1940 


Standard metropolitan 











51.7 
50.1 
49.2 
44.9 
43.3 
39.0 
40.4 
41.2 
36.4 
38.0 


Jackson, Miss. 63 ,917 51.0 
Montgomery, Ala. 60 ,616 37.3 
Charleston, S. C. 68 ,354 56.9 
Savannah, Ga. 58 ,547 42.9 
Memphis, Tenn. 180,185 48.9 
Birmingham, Ala. 208 ,616 24.8 
Columbia, S. C. 50 ,494 47.4 
Augusta, Ga. 56,113 36.7 
Mobile, Ala. 77,999 69.6 
Baton Rouge, La. 52,341 93.3 


_ 


45. 
43. 
41. 
38. 
37. 
37. 
35. 
34. 
33. 
33. 
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LARGEST SMA’s 


The heavy concentration of nonwhites in a few SMA’s is shown by 
the fact that the 10 SMA’s containing the largest number of nonwhites 
accounted for about half, 49 per cent, of the nonwhites living in all 168 
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SMA/’s and for over one-fourth of all nonwhites in the Nation in 1950. 
More nonwhites lived in the New York SMA alone than in any of 46 
States—1 out of every 15 nonwhites in the United States. 

The relative increase in nonwhites far exceeded that of whites in 
these 10 SMA’s, 63 compared with 17 per cent, as is shown in Table 6, 
and also substantially exceeded the nonwhite increase of 44 per cent in 
all 168 SMA’s. Because of this large increase, the nonwhites as a per- 
cent of the all race total population in these 10 SMA’s in 1950 (10.9) 
exceeded the proportion in all SMA’s and in the United States total, 
9.8 and 10.5 per cent, respectively. In 1940 the proportion of non- 


TABLE 6 


THE 22 SMA’S WITH NONWHITE INCREASES OF MORE THAN 
20,000, 1940-1950, RANKED BY NUMBER OF NONWHITES 
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Per cent Non- 

Nonwhite population— increase, white | Non- 

Standard metropolitan All races 1940-50 asa % | white 

area 1940 of all | rank 

Increase, ? Non- races, | 1950 

1950 1940-50 | White! white | 1950 

New York, N. Y. 12,911,994 | 1,046,045 377,191 8.0 | 56.4 8.1 1 
Chicago, III. 5,495 ,364 605 , 238 270,373 | 8.9] 80.7 11.0 2 
Philadelphia, Pa. 3,671,048 483 ,927 147,084 | 11.3 | 43.7 13.2 3 
Detroit, Mich. 3,016,197 361 ,927 189,149 | 20.4 | 109.5 12.0 4 
Washington, D. C. 1,464,089 342,159 111,332 | 52.2 48.2 23.4 5 
Los Angeles, Cal. 4,367 ,911 276 ,330 148,291 | 46.7 | 115.8 6.3 6 
Baltimore, Md. 1,337 ,373 266 ,671 71,895 | 20.5 | 36.9 19.9 7 
St. Louis, Mo. 1,681,281 216 ,454 65,006 | 14.4] 42.9 12.9 8 
San Francisco, Cal. 2,240,767 210,547 145,816 | 45.3 | 225.3 9.4 9 
Birmingham, Ala. 558 ,928 208 ,616 29,442 | 24.8 16.4 37.3 10 
Total, largest 10 36,744,952 | 4,017,914 | 1,555,579 | 17.2 | 63.2 10.9 _ 
New Orleans, La. 685 ,405 200 ,523 40,742 | 23.5 25.5 29.3 ll 
Memphis, Tenn. 482 ,393 180,185 24,890 | 48.9 16.0 37.4 12 
Atlanta, Ga. 671,797 165 ,816 22,422 | 35.0] 15.6 24.7 13 
Cleveland, Ohio 1,465,511 154,117 65,888 | 11.2] 74.7 10.5 14 
Houston, Texas 806 ,701 150,452 46,310 | 54.5 | 44.5 18.7 15 
Pittsburgh, Pa. 2,213 ,236 137 ,261 24,372} 5.4] 21.6 6.2 16 
Norfolk-Portsmouth, Va. 446 ,200 122 ,837 35,481 | 88.5 | 40.6 27.5 17 
Cincinnati, Ohio 904 ,402 95 ,656 26,636 | 12.6} 38.6 10.6 18 
Kansas City, Mo. 814,357 88 ,032 20,330 | 17.3 | 30.0 10.8 19 
Dallas, Texas 614,799 83 ,352 21,639 | 57.8] 35.1 13.6 21 
Mobile, Ala. 231,105 77,999 26,321 | 69.6 | 50.9 33.8 23 
Buffalo, N. Y. 1,089 ,230 47 ,786 23,905 | 11.4 100.1 4.4 42 
Total, next 12 10,425,136 | 1,504,016 378,936 | 20.3} 33.7 14.4 _ 
Total, largest 22 47,170,088 | 5,521,930 | 1,934,515 | 17.9 | 53.9 11.7 _ 
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whites to all races had been smaller in the ranking 10 SMA’s than in 
all SMA’s or in the United States as a whole. 

The 1,556,000 increase in the number of nonwhites living in the 10 
largest SMA’s represented 61 per cent of the 2,534,000 nonwhite in- 
crease in all SMA’s. The nonwhite segment of each of these 10 SMA’s 
in 1950 was in fact itself the equivalent of a large city. In the New York 
SMA, for example, there were 1,046,000 nonwhites in 1950. Only 14 
SMA’s had a greater all race population than that in 1950. Over 605,000 
nonwhites lived in the Chicago SMA in 1950. Even the Birmingham 
SMA, which ranked 10th in number of nonwhites, had 209,000, a 
larger number than the all race population in any of 81 SMA’s. Of the 
10 SMA’s with the largest number of nonwhites, only Washington, 
Baltimore, and Birmingham are located in the South. 

By expanding this group to include each SMA in which the non- 
white population increased by 20,000 or more between 1940 and 1950 
there are 22 SMA’s in all, shown also in Table 6. Of these 12 additional 
SMA’s, seven are located in the South, whereas only 3 of the 10 SMA’s 
having the largest number of nonwhites are in the South. Also, of these 
12 SMA’s, only four exceeded the 44.3 per cent non-white average in- 
crease for the 168 SMA’s. Moreover, in only two of these 12 SMA’s, 
Buffalo and Pittsburgh, did the nonwhites as a per cent of total popu- 
lation fail to equal the 9.8 per cent average for all 168 SMA’s. The rela- 
tive increase in nonwhites for these 12 SMA’s was 33.7 per cent, as 
against 20.3 per cent for the whites. This percentage increase of non- 
whites was far below that of the 10 SMA’s with the largest number of 
nonwhites, most of which are located outside the South. 

For the 22 SMA’s combined, the nonwhite population increased by 
almost 2,000,000, or 54 per cent, to a little over 5,500,000 which repre- 
sents 11.7 per cent of all races. The comparable whité population in- 
crease was only 18 per cent. Factors contributing to the substantial 
movement of the nonwhites to the larger cities include the greater in- 
come opportunities in the cities, less need for farm labor with the in- 
creased mechanization of the farms and plantations, and personal 
preference to move to the North and West. 


LARGEST RELATIVE INCREASES 


Whereas there was only one SMA, Albuquerque, in which the white 
population doubled from 1940 to 1950, there were 31 SMA’s in which 
the nonwhite population more than doubled. The nonwhite increase in 
these 31 SMA’s aggregated over 630,000 or one-fourth of the 2,534,000 
nonwhite increase in all 168 SMA’s during the decade. 
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TABLE 7 


THE 31 SMA’S IN WHICH NONWHITE POPULATION DOUBLED 
1940-1950, RANKED BY HIGHEST PER CENT OF 
NONWHITE INCREASE 








Per cent Non- 

Nonwhite population— increase, white 

SMA’s in which 1940-50 asa % 
nonwhites doubled of all 
Increase, Non- races 
1950 1940-50 White | white 1950 








SMA’s having 10,000 or more nonwhites 





San Francisco, Cal. 2,240,767 145,816 
San Diego, Cal. 556 ,808 14,121 
Milwaukee, Wis. 871,047 13 ,623 
Los Angeles, Cal. 4,367 ,911 148 ,291 
Portland, Oregon 704 ,829 15,949 8,484 


Denver, Colo. 563 ,832 20,190 10,681 
Detroit, Mich. 3,016,197 361,927 189,149 
Flint, Mich. 270 ,963 14,277 7,451 
Fresno, Cal. 276,515 19,165 9,754 
Buffalo, N. Y. 1,089 ,230 47 ,786 23 ,905 





Total, 10 SMA’s 13,958,099 | 1,013,253 | 571,275 














SMA’s having fewer than 10,000 nonwhites 





Racine, Wis. 109 ,585 1,880 
Manchester, N. H. 88 ,370 5 
San Bernardino, Cal. 281 ,642 8,641 
Ogden, Utah 83 ,319 
Lubbock, Texas 101 ,048 


tw N to 
ome wN 


Saginaw, Mich. 153 ,515 
Grand Rapids, Mich. 288 ,292 
Erie, Pa. 219 ,388 
Utica-Rome, N. Y. 284 ,262 
Spokane, Wash. 221,561 


G & bo & 
emoNane 


South Bend, Ind. 205 ,058 
Lima, Ohio 88 ,183 
Tacoma, Wash. 275 ,876 
Madison, Wis. 169 ,357 
Kalamazoo, Mich. 126 ,707 


moot 
Narrow 


Rochester, N. Y. 487 ,632 
New Britain-Bristol, Conn. 146 ,983 
Salt Lake City, Utah 274,895 3,871 
Fort Wayne, Ind. 183 ,722 5,368 
Springfield-Holyoke, Mass. 407 ,255 7,459 


women 


Peoria, Ill. 250 ,512 6,507 








Total, 21 SMA’s 4,447,162 104 ,929 

















Total, 31 SMA’s 18,405,261 | 1,118,182 
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Among these 31 SMA’s where the nonwhite population more than 
doubled were 10 with over 10,000 nonwhites in 1950, shown in Table 7. 
In the San Francisco-Oakland SMA, the nonwhites more than trebled. 
For these 10 SMA’s the nonwhite population increase averaged 129 
per cent, compared with 34 per cent for the white. Although in most 
of these 10 largest SMA’s the rate of increase for both the nonwhite 
and the white population was larger than the average rate for all 
SMA’s, in Milwaukee, Flint, and Buffalo the white increase was less 
than average, while the nonwhites more than doubled. On the average, 
nonwhite population comprised only 7.3 per cent of the total popula- 
tion of these 10 SMA’s. All 10 are located outside the South. 

Twenty-one of the 31 SMA’s in which the number of nonwhites 
doubled from 1940 to 1950, had fewer than 10,000 nonwhites in 1950. 
In eleven of the 21 the per cent increase in white population was below 
the 168 SMA average. For the 21 SMA’s, the nonwhite population in- 
crease averaged 134 per cent, compared with 23 per cent for the white. 
The nonwhite population comprised only 2.4 per cent of the total popu- 
lation of the 21 SMA’s, compared with 7.3 per cent for the top 10 
SMA’s. 

Of the 31 SMA’s in which nonwhites doubled from 1940 to 1950, 
only Lubbock, Texas, is in the South. 


NONWHITE PERCENTAGE INCREASE EIGHT TIMES THE WHITE 


During the last decade there were 10 SMA’s with an all race popula- 
tion of 100,000 or more in which the per cent increase in the nonwhite 
population was ten or more times that of the white population. More- 
over, in 16 SMA’s of 100,000 population, the nonwhite percentage in- 
crease was eight or more times as great as that of the white, as shown 
in Table 8. Half of these 16 SMA’s had nonwhite percentage increases 
of over 100 per cent, 2 had increases between 75 and 100 per cent, and 
only 6 had increases of less than 50 per cent. None of these SMA’s was 
located in the South. The average percentage increase for all 16 SMA’s 
was 84 per cent for the nonwhites and 8 per cent for the whites. The 
nonwhite population comprised 6.6 per cent of that for all races and 
numbered 733,000 in these 16 SMA’s in 1950. 


VARIATIONS IN RELATIVE GROWTH 


Although the increase in nonwhite population in all 168 SMA’s 
averaged 44 per cent from 1940 to 1950, that rate of increase cannot 
be regarded as typical, as is shown in Table 9. Thus, in only 16 of the 
168 SMA’s did the increase in nonwhites range between 40 and 50 
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per cent. In short, variation characterizes the rate of nonwhite growth 
over the decade, and to a somewhat smaller extent the white growth 
as well. 

The nonwhite changes ranged from a decrease of 37 per cent for the 
Fall River SMA to an increase of 266 for the Racine SMA. In all, 3 
SMA’s also showed a decline of over 20 per cent and 3 underwent an 
increase of over 200 per cent. A total of 9 SMA’s recorded a decline in 
nonwhite population. Yet in 61 SMA’s the nonwhites increased by 50 
per cent or more. The comparable white population increase of over 50 
per cent was experienced by only 27 SMA’s. Despite the wide range in 
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TABLE 8 


THE 16 SMA’S OF MORE THAN 100,000 POPULATION IN WHICH THE 
NONWHITE PER CENT INCREASE WAS EIGHT OR MORE 
TIMES THAT OF THE WHITE, 1940-1950 








Non- 
white as 


a % of 
all races 


P Per cent increase 
Absolute increase 





Standard metropolitan 


area Non- 


white 





White 


White Nonwhite 





Albany-Schenectady- 
Troy, N. Y. 

Buffalo, N. Y. 

Chicago, Il. 

Duluth, Minn.-Supe- 
rior, Wis. 

Grand Rapids, Mich. 


82.5 
100.1 
80.7 


44,741 
106 ,838 
399 , 464 


4,106 
23,905 
270,373 


30.4 
149.0 


371 
4,324 


—1,630 
37 ,630 


—7,766 704 
2,835 94 
90 ,539 13 ,623 
14,171 1,367 
12,776 1,080 


Johnstown, Pa. 
Lowell, Mass. 
Milwaukee, Wis. 
Racine, Wis. 
Reading, Pa. 


Rochester, N. Y. 44 ,933 
Saginaw, Mich. 17,477 
Sioux City, Iowa 98 
Springfield-Holyoke, 
Mass. 
Utica-Rome, N. Y. 


4,469 
5,570 
192 


38,749 
19,575 


3,826 
1,524 


Wilkes-Barre-Hazleton, 


Pa. 
Total 


—49 , 286 9 








771,144 





335 ,537 
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nonwhite increases noted among the SMA’s, about half of the 168 
SMA’s had increases of from 10 to 49 per cent. In two-thirds of the 
SMA’s the white population experienced a comparable range of in- 
crease. 

As was pointed out previously, inside the central cities of SMA’s the 
nonwhites increased much more sharply than did the white population 
over the decade, on the average 48 per cent compared with 10 per cent, 
respectively. Thus, in 42 central cities the nonwhites doubled compared 


TABLE 9 


FREQUENCY DISTRIBUTION OF PER CENT CHANGE IN NONWHITE 
AND IN WHITE POPULATION IN SMA’S, IN CENTRAL CITIES, AND 
OUTSIDE CENTRAL CITIES, 1940-1950 


NONWHITE POPULATION INCREASES 








Outside central 
cities 


In central 


SMA total cities* 


Per cent 
increase or 





decrease 


White 


Non- 
white 


White 


Non- 
white 


White 


Non- 
white 





Total 


168 


193 


193 


168 





—40.0 & over 
—20.0 to —39.9 
.0 to —19.9 


.0 to 
10.0 to 


20.0 to 
30.0 to 
40.0 to 
50.0 to 
60.0 to 


70.0 to 
80.0 to 
90.0 to 
100.0 to 
125.0 to 


150.0 to 
175.0 to 


9.9 


149. 


174.9 
199.9 


200.0 and over 


2 
1 
10 
12 
24 


26 
21 
18 
12 
12 





Mean 
Median 








44.3 
36.5 








48.3 
45.0 








32.0 
29.2 





* The 168 SMA’s contained 193 central cities, since there were 2 or more central cities in 21 SMA’s. 
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with only 6 for the white. Indeed, in 8 of these central cities, the non- 
whites trebled as against only 1 central city for the white. At the other 
end of the distribution, nonwhites in 10 central cities actually declined, 
whereas the white population declined in 28 central cities. It is appar- 
ent also from these figures that the nonwhite and white population 
changes inside central cities deviated from the ates even more 
widely than did those for the entire SMA’s. 

It was only in the suburbs that the nonwhite increases from 1940 
to 1950 fell somewhat below those of the white population, 32 com- 
pared with 36 per cent on the average. Here again, however, the popu- 
lation changes among nonwhites varied much more widely than did 
those for the whites. Accordingly, nonwhites actually declined decen- 
nially in the suburbs of 34 SMA’s compared with only 13 for the white 
population. Moreover, the nonwhites doubled in the suburbs of 28 
SMA’s compared with only 12 for the white. Of these, the nonwhites 
trebled in the suburbs of 13 SMA’s, while the white failed to treble in 
the suburbs of a single SMA. The SMA suburbs, therefore, experienced 
a much greater diversity in rate of nonwhite growth than in rate of 
white growth over the decade. 

From the foregoing data it is abundantly evident that the rate of 
both nonwhite and white population growth varies widely among the 
SMA’s, among their central cities, and among their suburbs. Such 
averages as arithmetic means and medians, therefore, fail to character- 
ize accurately the rate of population growth experienced by most 
SMA’s. Even the modal rate of change (the percentage by which the 
largest number of SMA’s grew) failed to include either the median or 
the mean for any of the three nonwhite and three white classifications. 
The chances are, in fact, that the rate of growth of any specific SMA 
may fail by a substantial margin to approximate the composite average 
of the 168 SMA’s. The frequency distributions shown in Table 9 indi- 
cate the wide latitude which is found in the growth patterns of the 
SMA’s. 


PROPORTION OF NONWHITES 


The variation in percentage of nonwhite population in the 168 
SMA’s also was very wide, ranging all the way from 0.2 per cent in four 
SMA’s to 45.0 per cent in the Jackson, Mississippi SMA. Although 
nonwhites comprised almost 10 per cent of all persons living in SMA’s 
in 1950, they amounted to less than 5 per cent of the total in approxi- 
mately half of the SMA’s, as is shown in the frequency distribution of 
Table 10. This means, of course, that while many of the smaller SMA’s 
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TABLE 10 


FREQUENCY DISTRIBUTION OF NONWHITE POPULATION AS A 
PER CENT OF THE ALL RACE TOTALIN SMA’S, IN CENTRAL CITIES, 
AND OUTSIDE CENTRAL CITIES, 1940 AND 1950 








In central Outside central 


SMA total cities* cities 


Nonwhite asa % 
of total 





1950 1940 1950 1940 1950 1940 





Total 168 168 193 168 168 





Oto 4.9 83 91 77 111 113 
5.0 to 9.9 41 26 15 
10.0 to 14. 20 9 14 
15.0 to 19. 14 6 6 
20.0 to 24. 9 _ 3 


25.0 to 29. 10 
30.0 to 34. 6 
35.0 to 39. 
40.0 to 44. 
45.0 to 49. 


50.0 to 54.9 
55.0 to 59.9 
60.0 or more 

















Mean 9.8 8.3 13.0 10. 
5.2 4.6 7.4 


0.0 2 5.4 
Median 4.9 8 3.7 











* The 168 SMA’s contained 193 central cities, since there were 2 or more central cities in 21 SMA’s. 


had a very small proportion of nonwhites, a number of the more popu- 
lous SMA’s had larger than average proportions of nonwhites. In no 
SMA did the nonwhites amount to as much as half of the total popula- 
tion in 1950, although in 21 at least 1 out of every 4 persons was non- 
white, and in 3 SMA’s nonwhites comprised over 40 per cent of the all 
race total. 

In 1940, nonwhites in SMA’s were relatively less numerous, 8.3 per 
cent of all races, than they were in 1950. In 1940, moreover, well over 
half of the SMA’s had a nonwhite count of less than 5 per cent of the 
total. Yet 9 SMA’s in 1940 had a nonwhite population of over 40 per 
cent of the total. Moreover, in 2 of these SMA’s there were relatively 
more nonwhites in 1940 than in 1950—Jackson, Mississippi and Mont- 
gomery, Alabama. Actually, the absolute number of nonwhites in- 
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creased in both of these SMA’s from 1940 to 1950. However, their 
white population increased so much faster that the proportion of non- 
whites declined. 

Nonwhites in the 193 central cities of SMA’s were relatively more 
numerous than they were in the SMA totals. Moreover, they increased 
more sharply from 10 per cent in 1940 to 13 per cent in 1950 on the 
average. Yet there were only 6 central cities in 1950 in which over 40 
per cent of the total population was nonwhite, compared with 9 in 1940. 
At the other extreme also there was a decline from 98 in 1940 to 77 
central cities in 1950 in which nonwhites amounted to less than 5 per 
cent of all races. 

In the suburban parts of the 168 SMA’s, the proportion of non- 
whites decreased slightly from 1940 to 1950. Of the total, there were 
111 in which the nonwhite population amounted to less than 5 per cent 
of the all race total in 1950—a slight decline from 1940. The number 
in which nonwhites amounted to from 5 to 10 per cent increased 
noticeably, however, from 15 in 1940 to 26 in 1950. At the same time, 
SMA suburban areas with over 40 per cent nonwhite population 
dropped sharply from 11 in 1940 to only 4 in 1950. As was pointed out 
previously, however, the decennial changes as between inside central 
cities and their suburbs were spurious to a minor degree in that there 
was an expansion of the boundaries of many central cities and a compen- 


sating shrinkage of their suburbs. 


SMA’s WITH SMALL PROPORTION OF NONWHITES 


Much recent emphasis has been placed on those SMA’s having a 
large proportion of nonwhite population. In many SMA’s, however, 
nonwhites are proportionately very few in number. For example, 
there were 40 SMA’s scattered throughout 18 states in which the 
nonwhite population comprised less than two per cent of all races in 
1950. In 20 of these SMA’s, the nonwhites aggregated less than one 
per cent, as is shown in Table 11. The average nonwhite population 
for the entire 20 SMA’s was only 0.5 per cent of their all race popula- 
tion count. 

Although relatively few nonwhites lived in these 20 SMA’s, those 
nonwhites increased by 29 per cent compared with only a 4 per cent in- 
crease in the white population from 1940 to 1950. Small though their 
0.5 per cent nonwhite proportion was, in 3 of the 20 SMA’s the non- 
whites doubled over the last decade, and in Manchester the nonwhites 
trebled. Yet 4 of these SMA’s actually lost nonwhite population in 
amounts ranging from 2 to 37 per cent. In 2 of the 4, Altoona and Scran- 
ton, the white population also declined, and in Fall River the white 
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TABLE 11 


THE 20 SMA’S IN WHICH THE NONWHITE POPULATION COM- 
PRISED LESS THAN ONE PER CENT OF THE ALL 
RACE TOTAL, BY STATE, 1950 


305 








Standard metropolitan 
area 


All races 
1950 


Non- 
white 
popu- 
lation 
1950 


Per cent increase, 
1940-50 





Non- 
white 





Iowa, Cedar Rapids 

Maine, Portland 

Mass., Brockton 
Fall River 
Lawrence 


Lowell 

Worcester 
Mich., Bay City 
Minn., Duluth-Superior 
N. H., Manchester 


N. Y., Binghamton 
Utica-Rome 
Pa., Allentown-Bethlehem- 
Easton 
Altoona 


Scranton 
Wilkes-Barre-Hazleton 
S. Dak., Sioux Falls 
Tex., Laredo 
Wis., Kenosha 


Madison 


Total 


104 ,274 
119 ,942 
129 ,428 
137 ,298 
125 ,935 


133 ,928 
276 ,336 
88 ,461 
252,777 
88 ,370 


184 ,698 
284 , 262 


437 ,824 
139 ,514 


257 ,396 
392,241 
70,910 
56,141 
75,238 


169 ,357 


826 
465 
1,105 
353 
380 


324 
1,936 
380 
1,591 
156 


899 
2,595 


2,575 
1,150 


818 
973 
426 
114 
284 


1,059 


—14. 
—11. 
22. 
22. 
18. 


29. 


20.9 
17.7 
19. 
—37. 
30. 


40. 
17. 
53. 
30. 
205. 


11. 
142. 


24. 
— 2. 


- §. 


82. 
—32. 
36.5 


124.8 








3,524,330 





18,409 





4.3 








28.9 





population increased by only 1.8 per cent. With the apparent lack of 
economic opportunity, in-migration of nonwhites would not be ex- 


pected. 


These SMA’s tend to be the smaller ones, with 14 of the 20 having 
an all race population of less than 200,000. None had a total population 
of a half million. In absolute terms, 13 of the 20 had fewer than 1,000 
nonwhites present in 1950. Of course, none of these 20 SMA’s was lo- 


cated in the South. 
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CAUSES OF VARYING RATES OF SMA GrowrH 


A number of factors combine to produce the differences in rate of 
SMA population growth." Since the present analysis has not isolated 
the separate effect of each of these factors by standardizing for size 
of SMA, for age of SMA, for per cent of nonwhites, for regional loca- 
tion, for type of industry dominating the SMA, for rate of industrial 
expansion, etc., only general conclusions can be drawn. Unquestion- 
ably, however, the influence of regional location is one of the foremost 
factors, as is shown in Table 4. Accordingly, western SMA’s may be 
expected to experience a high rate of growth, primarily because of 
their regional location. Other important causative growth factors are 
believed to be the age of the central city core of the SMA, with older 
SMA’s growing slower; the economic and industrial expansion and 
virility of the SMA and of the general area in which it is situated; the 
climatic, resort, and health appeal of the locality; and to some extent 
the size of the SMA, although there appear to be differences as between 
nonwhite and white components, with the nonwhite population grow- 
ing faster than average in the largest SMA’s and with the white grow- 
ing at about the average rate in the largest SMA’s. 

Moreover, variations in the tempo of industrial activity account for 
much of the redistribution of population from 1940 to 1950. Thus, war 
and postwar expansion in such fields ~s aircraft, ship building, auto- 
motive, light metals, munitions, chemicals, and plastics as well as con- 
tractions in such fields as coal mining, railroads, and textiles of some 
localities, played a real part in the variations in rate of growth of SMA’s. 
After the war, many GI’s further accentuated the shifts by failing to 
return to their former homes and instead by moving to the localities 
from which their wives came, or to those which they had seen or in 
which they had been stationed during the war, or to those which 
promised favorable employment opportunities. The surprising and 
generally unexpected high continuing levels of postwar employment 
made this high degree of mobility much easier to achieve than would 
be expected normally. In fact, a recession probably would tend to slow 
down, but not halt, the population mobility in general and the shift to 
SMA’s in particular. 

Certain special factors are noteworthy as playing influential roles in 
the steady influx of nonwhites to SMA’s. Foremost among these is 





4 For a detailed treatment of growth of all races in SMA’s, see Donald J. Bogue, Population Growth 
in Standard Metropolitan Areas, 1900-1950, U. 8. Housing and Home Finance Agency, Washington 25, 
dD Cc. 
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the desire of nonwhites to avail themselves of greater economic oppor- 
tunity open to them in some localities and regions than in others. This 
subsumes the opportunity to use and develop their improved training 
and skills; the freedom to join unions with their promise of equality 
and a measure of security, without respect to race or color; the legisla- 
tive protection of fair employment laws and enforcement machinery 
covering certain cities and States; and the increasing tendency of indus- 
tries in some areas to employ members of minority groups at all occu- 
pational levels on the basis of their competitive abilities. Important also 
are such sociological factors as the search for a larger measure of free- 
dom from segregation and from related social and economic barriers 
predicated on race or color; and the appeal of urban life and its attend- 
ant opportunities to that part of the rural population with a pioneering 
urge to improve its economic and social life. As these and doubtless 
other motives impel nonwhites to endeavor to improve their situation, 
certain traditional routes of migration often are followed. Thus, New 
York is considered a mecca for nonwhites in the South and for Puerto 
Ricans as well. Similarly, Detroit, Chicago, and St. Louis have long 
been among the major focal points for nonwhite migration even in the 
depression years of the Thirties, as have West Coast SMA’s during 
and since World War II. Most Texas SMA’s also had substantial in- 
migration of nonwhites during the Thirties. Many of these new in- 
migrants remain in their adopted cities; but many otherslater move on 
to more distant cities. 

It is impossible to quantify the effects of each of the many factors 
which prompt nonwhites to migrate or to remain at home. Ordinarily, 
when a person or a family undertakes as far-reaching an act as moving 
to a new city, more than one reason underlies that decision. It is clear, 
however, that the movement of nonwhites to SMA’s during the past 
decade is one of the most dramatic population trends to emerge from 
the last census, and is continuing today. 


SUMMARY 


Some of the more salient findings regarding nonwhite population 
increases in SMA’s from 1940 to 1950 follow: 

1. As a preface to a consideration of SMA changes, it is emphasized 
that the nonwhite population in the entire Nation has begun to in- 
crease at a faster rate than the white population, in contrast to the 
century-old precedent of a smaller relative increase in nonwhites. 

2. The nonwhite population is being redistributed generally as is 
evidenced by the fact that nonwhites living in rural farm areas declined 
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by 30 per cent, while those living in urban areas increased by 50 per 
cent over the decade. 

3. Nonwhites living in SMA’s increased over twice as fast as did the 
white population from 1940 to 1950, or by 44 per cent and 20 per cent, 
respectively. This is relatively 24 times as fast as nonwhites increased 
in all types of areas in the United States. 

4, In-migration accounted for an estimated almost two-thirds of the 
nonwhite population increase in SMA’s and in their central cities, and 
for almost half of their increase in the SMA suburbs. 

5. In their movement to SMA’s nonwhites have gravitated very 
sharply to the central cities, whereas the white shift has been markedly 
to the suburbs. Indeed, in a number of central cities the nonwhites 
increased, while the white population actually declined. This probably 
is ascribable partly to the preference of many nonwhites for close-in 
locations and partly to the lack of new construction available to non- 
whites in outlying areas. 

6. Regional nonwhite increases were numerically largest in the North 
Central SMA’s and relatively largest in the SMA’s of the West. Yet the 
nonwhites have by no means forsaken the southern SMA’s, for the 
SMA’s of the South still embrace the largest number of nonwhites. 

7. There are great concentrations of nonwhites in a few SMA’s. In 
fact, half of all nonwhites in the 168 SMA’s and one-fourth of all non- 
whites in the Nation live in 10 SMA’s. Nonwhites in these 10 SMA’s in- 
creased much faster from 1940 to 1950 than did the white population, 
relatively. 

8. Yet nonwhites did not increase equally in all SMA’s over the 
decade. In 9 SMA’s the nonwhites actually decreased in number, in 
31 they doubled, and in 3 they trebled. The white population doubled 
in only one SMA. 

9. Nonwhites are not present to the same degree in all SMA’s. Thus, 
in 40 SMA’s scattered throughout 18 States nonwhites comprised 
less than 2 per cent of the all race total in 1950. Still in 14 SMA’s, 
nonwhites amounted to over 30 per cent of the all race total. 

10. The reasons for the nonwhite population surge to SMA’s cannot 
be quantified. They encompass such considerations, however, as better 
employment opportunities, the search for greater freedom from segrega- 
tion, and the appeal of urban life. 





THE PROSPECTS FOR POPULATION FORECASTS 


JoHn Hasnau* 
University of Manchester 


consult a soothsayer, a mathematician, or a forecaster. ... May 
curiosity to foretell the future be silenced for ever.”! However, even 
the death penalty was, it seems, insufficient to eradicate the con- 
demned practice. Recently, new “scientific,” and not infrequently 
mathematical, techniques of foretelling population growth have pro- 
vided novel methods, and a new group of experts, for satisfying this 
basic human need. 

Elaborate sets of population projections have in the last twenty-five 
years become a well established feature of the demographic literature 
of the Western World. Projections by professional students of popula- 
tion have been increasingly relied on by others—economists, politi- 
cians, civil servants—who in former times would have made their own 
guesses about future population as the occasion arose. Widespread at- 
tention was first focussed upon the projections when the prospect of 
diminishing growth or even decline, which they presented, came as a 
shock to the public. But the vogue of projections has continued in 
spite of lack of agreement between the predictions and the facts (which 
became manifest when the sudden spurt of population growth in the 
1940’s belied the prophets of doom). 

It is the purpose of this paper to argue 

(1) that population projections in the future as in the past will 

often be fairly wide of the mark—-as often as simple guesses 
would be; 

(2) that, nevertheless, the frequent preparation of projections will 

continue; 

(3) that a projection can be useful as a piece of analysis even if its 

accuracy is low; 

(4) that simple, unpretentious short term projections should be used 

to meet most practical needs for population forecasts; 

(5) that greater flexibility and variety in techniques for projecting 

births need to be developed. 
* A shortened version of this paper was presented to the World Population Conference held in 
Rome in September 1954. 


j 1 Nemo haruspicem consulat, aut mathematicum, nemo hariolum. .. . Sileat perpetuo divinandi curi- 
ositas. (Codex Theodosianus, lib. IX. tit. XVI. t. 4.) 
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If some of the following seems unduly dogmatic or provocative, I cun 
only appeal in extenuation to the limitations of space. 

Prophecy about the future of human societies is an uncertain busi- 
ness; there seems no reason to expect more success in the prediction 
of numbers than in the forecasting of other basic features of historical 
development. It is true that population forecasting, like economic 
extrapolation, is a much more “scientific” and respectable business 
than, say, predicting the date of outbreak of the next war or the name 
of the next Pope. This impression is, no doubt, in part due to the use 
of numerical techniques of extrapolation which may suggest analogies 
with astronomers’ calculations of the future position of the stars. 
Another important factor, however, is the protection given to popula- 
tion forecasters by the slow rate at which population changes take 
place. Usually, almost any method of extrapolation from the past will 
give results that do not look absurd for a few years. (Indeed, widely 
different methods may often give similar results.) This fact enhances 
the reputation of projectors in another way. The consumer of fore- 
casts does not realize that, so far as predictive accuracy is concerned, 
much of the elaborate technique of forecasters is expended in vain; 
crude methods could have achieved equally good results. Population 
forecasters are not the only ones who have benefited from this public 
ignorance. The public opinion poll forecasts of presidential election 
results have been no better than had the predictions been obtained by 
assuming at each election that the votes of the major parties were 
divided in the same proportion as at the previous election [12]. In 
econometric forecasting also, the use of elaborate models has been 
known to produce worse results than the most naive extrapolation [3]. 

In the past, prophets have often been upset not so much because 
their arguments were wrong as because they turned out to be irrelevant. 
It is the failure of human history to repeat itself, the appearance of 
the new and the unexpected that renders the search for good methods 
of forecasting hopeless. However much we improve our tools to take 
care of all that happened in the past, something will sooner or later 
crop up for which we are unprepared. Consider, for example, the fore- 
bodings expressed by Malthus and his followers about the British 
food supply and, consequently, about future population growth in 
Britain. It was not that they underestimated the possibilities of 
British agriculture; the argument about its potentialities became 
irrelevant because importation of food from overseas developed on 
such a scale that British agriculture declined. Something unsuspected, 
the development of railways which made practicable the transporta- 
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tion of food from the interior of the new world, upset their prophecies. 

The situation which faced forecasters in the 1930’s similarly illus- 
trates how something essentially novel may play havoc with the best 
judgment based on the past. It seems almost impossible that anyone, 
however great his ingenuity and however extensive his knowledge of 
the facts, could have foreseen the “baby boom” of the 1940’s. Someone 
who realized what is now known about the possibilities of “bunching” of 
births (as a result of changes in the pattern in which successive cohorts 
distribute their births over time) might have taken a more cautious 
attitude to the forecasts of population decline, but he could hardly 
have foreseen birth rates as high as those of the 1940’s. Indeed, suppose 
a man had had the gift of prescience about everything in the 1940’s 
except the number of births. He would have known about the war 
and about the economic prosperity. Even if he had in addition been 
the most competent demographer in the world, he would almost 
certainly have guessed wrong. He would probably have argued, for 
example, that in wartime Britain, groaning under austerity and ex- 
posed to German bombardment, the number of births would (in spite 
of full employment) be very low. If with all our hindsight we cannot 
blame the demographers of the 1930’s, what reason have we to expect 
better luck in the future? 

The factors whose effects on future growth we can calculate are 
likely to be frequently outweighed by the unpredictable. It is this 
which accounts for the failure of more complex techniques to yield 
more accurate results than simple techniques and which casts doubt on 
the value of forecasting. We cannot hope to develop better methods 
which yield forecasts clustering more and more closely round the true 
future population. New and more complex techniques which may yet 
be invented are, I think, just as liable as past techniques to be fairly 
often upset by the unpredictability of history. They will probably 
just as often—and that means rather frequently—give results which 
are very wide of the mark and less accurate than crude guessing. 

This view of the situation is confirmed in the field of local forecast- 
ing, i.e., forecasting the population of smaller geographical units within 
nations. National forecasts are not often repeated by identical methods 
in identical circumstances. However, in local forecasting much more 
experience has accumulated, particularly in the United States, 
by which the success of different methods may be judged by a fair 
number of trials in comparable circumstances (for example, when fore- 
casts for each of the forty-eight states are made). Series of fore- 
casts have been specifically computed for such comparisons. The fol- 
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lowing quotation from a survey of the American experience by Siegel 
illustrates the point I wish to make: “From the point of view of 
accuracy, the simple methods appear to be just as accurate as the com- 
plex ones. ... Average errors are not always disturbingly large for 
short-term forecasts, but extremely large errors may appear in par- 
ticular cases from the very start with any of the methods tested. The 
tests are consistent in showing an increase in the error with an increase 
in the length of the forecast period; 20-year forecasts generally have 
large errors and a sizeable proportion of errors exceeding 10 per cent” 
[16]. 

One argument from the experience of population forecasting may 
be urged against the view here taken that future events are essentially 
unpredictable. Projecting the survivors of cohorts already born—e.g., 
for projections of the labor force—has generally proved reasonably 
successful. However, it may be doubted whether success in this field 
really amounts to successful forecasting of future events. Projected 
mortality rates and forecasts of deaths have in fact often been very 
inaccurate. However, with the low death-rates now current in Western 
countries, a large error in projecting deaths corresponds to only a small 
proportionate error in the survivors (except at advanced ages). To 
predict the population aged 20 five years hence amounts essentially to 
noting that those now aged 15 will then be 20. This sort of calculation 
(which may involve technical difficulties, for example owing to errors 
of enumeration) is very useful indeed, but it is hardly the prediction 
of anything which is seriously in doubt. Of course, the procedure in- 
volves the prediction, or assumption, that there will be no cata- 
strophic mortality (say, owing to H-bombs). But, to the making of this 
assumption the special techniques of demography are not relevant. 

Are population projections still produced and used by the public 
only because knowledge of their failure in the past has not been widely 
diffused outside the circle of demographers?? Or because experts, who 
are aware of the fate of past forecasts, are nevertheless still hopeful 
that better methods may be found? It seems doubtful whether a radical 
pessimism about the possibility of forecasting has much chance of 
general acceptance in the present atmosphere of confident expansion in 
the social sciences. But even if such a view were to spread it would not 
mean the abandonment of all population projections. 

The demand for guesses at future population seems to have been 





2 So far as I know, efforts to publicize the failure of population projections have been made only in 
the United States. Professor Joseph S. Davis has published some scathing comments with this aim. 
(See, for example, reference [4]). 
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growing. The increase in government planning and the growth of 
social insurance and social services is perhaps the main reason. At 
the same time, the statistical material available for the study of popu- 
lation and the techniques used for their analysis have become much 
more plentiful and complex. Increasingly they require specialists to 
handle them. It is improbable that detailed analysis of the factors at 
work in population growth and guessing on the basis of that analysis 
will be given up in favor of the older practice of guessing, without analy- 
sis, on the basis of total population figures and simple rates of growth. 
Analysis will not protect us against unforeseen developments, but it 
may often give useful warning against uncritical extrapolation of past 
experience. Moreover, the need in modern times is often not for fore- 
casts of total population but for projections of special classes, such as 
particular age groups, numbers of people with certain numbers of chil- 
dren, etc., and such projections generally necessitate the use of spe- 
cialist techniques. 

It seems unlikely that the number of whole or part-time population 
specialists will drop or that they will cease to turn out population 
forecasts for a variety of purposes. The situation in economics offers 
something of a parallel. The economists demonstrated conspicuous 
failure in prediction much earlier than the demographers—at the time 
of the great depression. Yet the number of economists employed in 
industry, government and international bodies has risen prodigiously 
since the slump. The rapid expansion in government activities and in 
the volume of economic statistics has no doubt helped to bring this 
about. An analogous situation exists, on a smaller scale, in demography. 
Population forecasters have no need to fear that failure will result in 
loss of jobs. 

The demand for population forecasts is in part nourished by motives 
which are slow to react to evidence about their inaccuracy. Even 
very inaccurate forecasts often meet a need, the same age-old need 
which (perhaps even more than curiosity) has caused people to turn to 
forecasters or soothsayers of all kinds, the need to take decisions. This 
seems to be the meaning of the demand so often faced by the statisti- 
cian “Give me a forecast, any figure is better than none.” 

It is sensible to bring to bear on a decision whatever scientific appa- 
ratus is available. There is, however, some danger in this process. 
Anyone in the early 1940’s having to plan the educational facilities 
for children of elementary school age in ten years’ time would, as we 
now know, have acted more wisely if, instead of looking at population 
projections, he had assumed that nothing was known about the future 
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numbers of births, that an increase was as likely as a decrease, and that 
plans must be prepared for both eventualities. However, decisions in 
government and business must usually be based on some view of 
likely future development and we can assume that population projec- 
tions will continue to be used for this purpose. 

What then can be done towards better population forecasting? We 
might begin by assessing what went wrong with past forecasts. A 
striking indictment could be drawn by compiling examples of wide 
discrepancies* between populations as forecast and as enumerated. A 
second line of criticism, which was mentioned earlier, might be that 
the complex exertions of the forecasters achieved predictions which 
were often further removed from the facts than naive extrapolations 
(based on, say, the growth of the 5 years before the forecast was made). 
These criticisms concern the accuracy of the projections. However, 
there is yet another question. The authors of projections have in any 
case usually guarded themselves against attack on grounds of accuracy 
by statements to the effect that projections are not predictions, but 
only show the results of certain hypothetical assumptions. The ques- 
tion is: were the assumptions relevant? Though their authors cannot 
be blamed for it, the projections of the 1930’s and early 1940’s turned 
out to be wrong in analysis. They were intended principally to demon- 
strate one thing, the prospect of the end of population growth and of 
actual decline in the near future. If the populations of Western nations 
today fell short of their predicted numbers by as great a percentage as 
they in fact exceed them, this would probably be considered a tri- 
umphant justification of the analysis underlying the projections. It 
could also be considered as confirmation of the analysis if, though the 
populations are now larger than predicted, decline had been aver'ed 
only by drastic measures to raise the birth rate (this is widely believed 
to have been the case in France). The success of a population projection 
as a piece of analysis is not measured by the percentage difference be- 
tween the projected and the actual populations. 

Perhaps the greatest achievement so far in the field of population 
forecasting was a projection constructed by Edwin Cannan [2] in 
1895. At a time when the population of England and Wales was still 
growing by more than 10 per cent per decade with no sign of slacken- 





3 The relation to the facts of projections of the U. 8. population has been extensively examined by 
Davis [4] and Dorn [5]. That the situation in Western Europe is equally unsatisfactory may easily be 
seen, for example, by studying the collections of projections for various countries by different authors 
which were assembled by Glass [7] and Notestein and others [13]. The latter volume contains an Ap- 
pendix which presents by means of graphs the results of earlier forecasts for the individual countries of 
Europe. It also contains a full bibliography of earlier projections, 
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ing, and population forecasts were commonly based on the assumption 
that this rate would continue, Cannan predicted that growth would 
cease in the 20th century. He used essentially modern methods. He 
diminished the population of each age group by allowance for mortality 
and added an allowance for births based on the ratio of births to 
persons of reproductive age. He arrived at a prediction which was at 
variance with commonly accepted ideas of his time* and foresaw an 
important development in population growth by an analysis of the 
factors at work. As a result he could publish 36 years later, in 1931, a 
paper which said in effect “I told you so” [1]. 

Yet the accuracy of his forecast was not outstanding. By 1911, only 
fifteen years after the forecast was made, the population enumerated 
at the census exceeded the prediction by 7 per cent and by 1916 the 
population had increased beyond his estimate of the maximum it would 
ever reach.5 

Cannan’s work could not be regarded as useful from the point of 
view of the practical consumer of forecasts in his day. He condemned, 
as having no rational foundation, the extrapolation of the growth 
rate of the last intercensal period. However, if one carries the popula- 
tion forward for two decades from the 1891 census, using the 1881-91 
growth rate of 11.65 per cent, one obtains for 1911 a figure which 
differs from the census count by less than one quarter of 1 per cent. 

Projections of greater accuracy than Cannan’s are not rare in the 
history of population forecasting. This can easily be illustrated from 
the survey of population projections for the United States which was 
made by Dorn [5] in 1950. He gives many comparisons between actual 
and projected populations. Particularly instructive instances are pro- 
vided by two projections published in the 1920’s. The first, by Pearl 
and Reed, was constructed by fitting a logistic curve to the census 
counts of 1790 to 1910. The projections came within one per cent of 
of the 1920 and 1930 census totals, overestimated the 1940 population 
by 3.5 per cent and underestimated the 1950 population by about 1 
per cent. 

The second projection was published by Whelpton in 1928 and was 





4 He wrote for example: “During the last twenty years most of us have not succeeded in detecting 
any considerable change in the manners and customs and practices which affect natality, and yet it 
only requires a continuance of the change which has undoubtedly been going on to bring about a state 
of things which could cause the possibility of a decline in population, instead of the possibility of over- 
population, to be the bugbear of alarmists.” (This was written in 1895!) 

5 He did not in his 1895 paper actually give any projected figures except for the maximum popula- 
tion. I have deduced his forecast for 1911 from the graph he gives. He had no illusions about the value 
of working out precise figures of future populations; “The value of the diagram lies not in its prediction 
of a maximum population of thirty-seven millions, but in the fact that it shows how a cessation of growth 
may be reached within no very long period without any violent or unnatural changes.” 
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made by the modern component method. It gave results very similar 
to the Pearl-Reed logistic. The total population predicted exceeded 
the census figure by five per cent in 1940, but for 1950 was within 1 per 
cent of the census count. But these authors themselves proved that 
their success was only accidental. As a consequence of the rapid fall in 
the birth rate they came to feel that their projections were too high. 
Whelpton, and later Pearl and Reed, issued revised estimates which 
were lower than the original ones and turned out to be in worse agree- 
ment with the facts.’ 

The sort of achievement which Cannan accomplished in 1895 is, of 
necessity, rare. It can only occur by a meeting of the opportunity and 
the prepared mind. Its function is not to produce a confident prediction 
(this I believe to be impossible), but to instill skepticism about other 
people’s confident predictions, to reveal by detailed analysis that the 
implications of the present statistics have been misinterpreted, to show 
that by “continuance” of what has been going on and “without any 
violent or unnatural changes” (in Cannan’s words) the future may 
turn out to be very different from what is commonly supposed. This 
is achievement of a high order. 

At the same time, this is not the kind of excellence which is required 
by most consumers of population projections. What they need is fore- 
casts (generally relatively short-term forecasts) which differ by only a 
small percentage from the actual population, whether they be based on 
a correct appreciation of the forces at work or on black magic. The 
elaborate sets of projections often prepared by demographers are not 
well suited to this need, not only because they are not very accurate, 
but because, owing to the amount of work which their preparation 
involves, they are often a year or two behind the latest data by the 
time they are published and cannot very well be kept up-to-date. 
This can be a serious disadvantage in a period of sharp fluctuations. 

In recent years a person needing a population forecast for, say, five 
years ahead, would often have done better to use the latest data and 
make a crude guess, rather than rely on a population projection pub- 
lished a few months earlier. Of course, the non-specialist will often find 
it inconvenient to find his way about population statistics, and one of 
the services which demographers or a central statistical agency can 
perform is to supply, at regular intervals, guesses about future popula- 





€ The agreement was much legs close for individual age groups. 

7 Pearl and Reed, however, in publishing their revised figures, “did not express a clear choice” be- 
tween their two logisties though they “seemed to have a slight preference” for the later one. The logistic 
fitted to the counts for 1790 to 1910 turned out to be a far better predictor of the 1950 figure than the 
curve which also took into account points for 1920, 1930 and 19401 
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tion based on up-to-date information. In Western countries annual 
estimates of population by sex and age are now generally prepared as 
a matter of routine. It is only necessary to use the latest estimate to 
calculate “survivors” for a few years ahead, and add in a guess for 
births (and migration if desired). Both the survival rates and the birth 
estimates can be of the crudest kind, but should not be at variance with 
the latest information. Thus the survival rates should not be based 
on an official life table five years old. (In recent times the decline in 
mortality has often exceeded projected mortality declines very soon 
after the projection was made.) There is no reason why the figure for 
births should not be the total for the last five years or something equally 
simple. Quick, crude methods can, of course, be employed also to pro- 
duce a range of predictions rather than a single “best guess.” 

In some circumstances, the fitting of simple growth curves® will be 
useful. On the whole, however, growth curves are most likely to find 
application in under-developed countries with poor statistics. Projec- 
tion under these conditions is a separate subject outside the scope of 
this paper. 

In spite of the fate of most of the projections done in the period 
1930-1945, sets of quite elaborate projections have continued to be 
produced in recent years. In most cases the technique of computation 
is the same as that used mainly in the 1930’s, i.e., to start with a base 
population divided by age and sex, work out the number of survivors 
by means of survival ratios based on age-specific mortality rates and 
add in births by applying age-specific fertility rates to the female 
population. (Generally the calculation is carried out with rates specific 
by 5-year age groups, yielding future population figures at 5-year 
intervals. ) 

We may begin by asking why this technique is an improvement over 
the older methods of fitting curves to total population, cr extrapolating 
rates of growth or crude birth and death rates. The standard modern 
technique has, I think, two main advantages, which are related to 
each other. The first is that this method reveals certain future develop- 
ments which are “inherent” in the present age distribution. It is this 





8 However, the special prominence of the logistic curve in this field seems exaggerated. (a) Some 
writers appear to believe that an S-shaped curve is necessarily a logistic. In fact, even the biological 
data on which the reputation of the logistic is partly based, can be equally closely fitted by other S-shaped 
curves with the same number of arbitrary parameters [6]. (b) Fitting a logistic to a set of data is not 
a determinate process. It is often possible to fit two different logistics to a given set of figures such that 
both give a very good fit and yet they give very different results when extrapolated. This is true even 
when several observations on both sides of the point of inflection are available (see, for example, reference 
[15] particularly p. 35). When, as is so often the case in demography, all the observed figures lie on one 
side of the point of inflection, a wide variety of predictions can generally be produced by fitting a logistic. 
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point which largely explains the great significance which has been 
attached to this method of projection. Analysis of the effects of the age 
distribution on the crude birth and death rates made possible the 
predictions of population decline which captured the public imagina- 
tion. The second advantage consists in the correspondence which is 
believed to exist between age-specific mortality and fertility rates and 
the causal factors about which forecasters feel they can prognosticate. 
For example, suppose that a person forecasting the population of Eng- 
land 30 years ago had reasoned correctly that mortality would prob- 
ably fall greatly, owing to increases in the general standard of life, the 
development of medical services, etc. If he had concluded that the 
crude death rate would fall he would have been wrong. While age 
specific death rates have fallen dramatically the crude death rate has 
hardly changed owing to shifts in the age structure. 

These are considerable advantages. Yet the usual manner of fore- 
casting births in more elaborate projections hardly seems adequate to 
present day needs. No doubt the number of births is influenced by the 
number of women of child-bearing age; for example, an upper limit is 
thus set to the number of births that can occur. But, within the range 
of variation which it is of interest to predict, the relationship between 
changes in these two factors over time is not close. For example, in 
England (and some other countries) the recent “baby boom,” which 
occurred in a time when the number of women aged 20-40 was begin- 
ning to fall, followed upon a period when an increasing number of 
women was accompanied by a declining number of births. It is very 
strange that so many forecasters have expended great computational 
labor on taking account of the effect on future numbers of births of the 
number of women in each five year age group, while entirely neglecting 
other factors on which statistical information was readily at hand. Such 
factors are: impending change in the ratio of men to women, abnormal 
weighting of the married population with recent marriages, sharp 
diminution in single population leading one to expect a decline in the 
number of marriages, large fluctuations in the distribution of births by 
parity in the last few years, etc. Even crude calculations to illustrate 
the effect which those various influences might have on future numbers 
of births seem more worthwhile than elaborate extrapolations of age 
specific fertility rates. 

The justification for preparing complex population projections must 
be the expectation of following in Cannan’s footsteps. The forecaster 
must be inspired by hope of analytical insight rather than of accurate 
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prediction and practical help to administrators. The technique used 
must be appropriate to this aim. 

Techniques of projecting into the future are, whether by design or 
not, intimately connected with methods of analyzing the past. The 
standard technique of projecting births arose out of the habit of 
analyzing fertility in terms of the age-specific rates of women, which 
were added up into gross and net reproduction rates. It is not possible 
today to frame sensible assumptions about future fertility in terms of 
the traditional age-specific rates or gross reproduction rates. Straight- 
forward extrapolation of the past trend—a long-term decline followed 
by sharp fluctuations—is meaningless. One may try to draw inferences 
about the factors influencing people to want more or fewer children, 
but the experience of the past has shown that it would be just as wrong 
to pass from the belief that “people will want fewer children” to the 
conclusion that “the gross reproduction rate will fall,” as it would be 
to predict a fall in the crude death rate from an improvement in health 
conditions. To put the same point in another way, to assume mainte- 
nance of current age-specific fertility rates may be to imply very 
strange assumptious about the size of family, in the same way as 
continuance of the crude death rate may involve unlikely and unin- 
tended assumptions about mortality. 

Some pre-war projections went beyond the simple age-specific fer- 
tility rates in forecasting the number of births—for example, by making 
adjustments for variations in the ratio of men to women or projecting 
the proportions of women married and applying separate fertility 
rates to married and unmarried women.?® Recently the efforts at apply- 
ing novel techniques for analyzing the number of births have occa- 
sionally found expression in forecasting. Thus births have been pro- 
jected by forecasting the number of marriages and then forecasting 
the births from the marriages [14, 10, and 9]. The procedure of basing 
the forecast of marriages on the population of single persons has com- 
mended itself, since in recent years there have been sharp decreases 
in the proportions remaining single in various age groups, which sug- 
gests that a fall in the number of marriages is in prospect. To pass 
from marriages to births, assumptions may be made concerning (a) the 
average number of births to be produced by each marriage and (b) the 
distribution of these births over the years subsequent to the marriage. 





* This is, however, often an undesirable method of taking account of variations in marriage patterns 
in projections, as also in computing reproduction rates (the reasons are discussed, in the latter context, in 
[8] para. 128). 
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The procedure may be varied by projecting first births from marriages, 
second births from first births and so forth, thus using directly the 
known numbers of first, second, etc. births, as well as of marriages, in 
the recent past. This requires assumptions concerning (a) the number 
of births of order n+1 per birth of order n and (b) the distribution 
of the births of order n+1 over the years subsequent to the occurrence 
of the births of order n. Another procedure, following a different line 
of approach, is to use the traditional age-specific female fertility rates, 
but arrange them in terms of successive cohorts and make assumptions 
about the total number of births of cohorts. Here again births of 
different order may be treated separately (see [17] which is summarized 
in [11)). 

Many other procedures, ranging from simple methods such as 
projecting births on the basis of the number of men, to complex compu- 
tations, are implicit in modern methods of analyzing fertility. In one 
sense, work on these lines will make the interpretation of forecasts 
more bewildering. New techniques may suggest the possibility of 
novel patterns of population change different from those predicted by 
traditional methods. For example, where two projections have been 
made by the traditional method, using different fertility assumptions, 
the difference between the resulting numbers of births is slight at first 
and then increases indefinitely, the direction of the difference being 
constant. If projections are made using the cohort principle! and 
varying the distribution of births over the lifetime of different cohorts, 
this will no longer hold and quite natural assumptions may result in 
completely different patterns of variation. Projected births may differ 
widely even for the first few years. Moreover, it may be that on one 
assumption the births will be lower in the first time period, but higher 
in the second time period, than under another assumption as to fer- 
tility behavior. Such patterns may make the estimation of maximum 
and minimum figures within which the population will lie a complex 
problem from the purely technical point of view. 

The applicability of techniques of forecasting depends, of course, in 
part on their complexity and on the availability of the statistical data. 
Some methods require materials which are only rarely available. 
Principally, however, the interest of a method depends upon what has 
been going on in the population to which it is to be applied. Analysis 
may reveal features in the recent history and structure of the popula- 





10 By cohort principle, I mean a method which may make more births in one time period result in 
fewer births in the next, i.e., which allows for shifts in the distribution of births over the lifetime of suc 
cessive cohorts. In this sense all the methods discussed in the previous paragraph embody something of 
the cohort principle. 
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tion which can be expected to have effects on future births, effects 
which are not adequately taken into account by traditional methods. 
The problem of the availability of data thus tends to solve itself. If 
there are data to show the effect of a certain change on births in the 
recent past, these data can almost always be used for a calculation con- 
cerning the future. 

It would be pointless to try to find the best method or the right 
method of projecting births, just as, I believe, it is pointless to look 
for the right method of measuring fertility. In fact the two are the 
same thing; a technique of projection implies a technique of measure- 
ment and vice versa. Moreover, the need is not for elaborate projections 
by this or that particular new method. Rather each forecaster should 
try to use the information available to him to the best advantage to 
throw light on possible future development. This may involve trying 
several methods, if necessary by rather crude computations. In addi- 
tion, it would be desirable to have some special studies especially to 
compare the properties of various techniques of projection and to 
devise short-cut methods of calculation, so that the forecaster may 
have a better view of the various tools at his disposal and be able to 
apply them without prohibitive expenditure of time. 

If there is a general lesson to be drawn from all this, it is, I think, 
first that as little forecasting as possible should be done, and second 
that, if a forecast (more elaborate than the quick calculation discussed 
earlier) is undertaken, it should involve less computation and more 
cogitation than has generally been applied. Forecasts should flow from 
the analysis of the past. Anyone who has not bothered with analysis 
should not forecast. The labor spent in doing elaborate projections on 
a variety of assumptions by a ready-made technique would often be 
much better employed in a study of the past. Out of such study may 
occasionally come important insights about unexpected possibilities in 
the future. 
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A TECHNIQUE FOR ESTIMATING THE POPULATION 
OF COUNTIES 


Hvucu H. Brown 
California Taxpayers’ Association 


URRENT estimates of the population of counties are desired for a 
C number of purposes which, however, are so widely recognized, 
that discussion of them seems superfluous. Several methods have been 
developed for making postcensal estimates of county populations [6, 7, 
9, 13]. This article outlines another which may be applied when cer- 
tain statistical data are available, most important of which are school 
enrolments by grade and births by place of residence. Also desired 
are deaths by place of residence, voter registrations, and automobile 
registrations. The estimating procedure derives from the logic of the 
problem and the materials. 


TERMS AND FORMULA 


The basic statistical series used for current population estimates is 
the count of children in grades one through eight. This is identified for 
formula purposes as the school count. The ratio between the total 
population of a county and the number of children in grade school at 


the time of the 1950 Census is easily calculated, and is referred to as 
the Census ratio. Unfortunately, that ratio is constantly changing as 
the result of variations six to fourteen years earlier in the birth rate 
together with modifications resulting from migration, economic 
activities, and other influences. Consequently, to produce current esti- 
mates of population based on elementary grade counts, it is necessary 
to introduce a third element—a correction factor. 

The theoretical formula is, then: school count XCensus ratio Xcor- 
rection factor = population estimate. 

In practice, the correction factor is composed of two factors to which, 
for identification, have been given the names of “ratio change factor” 
and “conversion factor.” The ratio change factor compensates for 
changes in the Census ratio. The conversion factor corrects for an error 
in the ratio change factor, thus “converting” it to a correction factor. 
Lastly, certain adjustments are made as the result of specific tests. 
In its full statement, then, the formula is: school count Census ratio 
Xratio change factor Xconversion factor +adjustments = population 
estimate. 

How each of these components is derived and used is explained in the 
following paragraphs. 
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SCHOOL COUNT 


California public schools count active enrollees as of each March 31 
and October 31. Counts are collected: and tabulated by the State 
Department of Education and are published in their monthly bulletin, 
“California Schools.” The published summaries show totals by county 
by grade by sex from kindergarten through grade 14 and in special 
classes. 

In early 1950, the California Taxpayers’ Association canvassed 
church organizations to find those operating regular parochial day 
schools and boarding schools. Those identified report to the Association 
their end-of-school year (June) and beginning-of-school year (Septem- 
ber) counts which the Association summarizes and releases by county 
by grade from kindergarten through grade 14. 

For population estimating purposes, parochial and public school 
counts for grades one through eight are combined, including elementary 
level classes for ungraded, physically handicapped, and mentally re- 
tarded pupils. The parochial schools follow the State Department of 
Education’s manual of rules and regulations controlling age of entry 
and other requirements. Essentially, then, the school counts provide 
a statistical series which is consistent from county to county and year 
to year. Attendance at school by children in the elementary school 
ages is virtually at saturation in California though there is a small 
range of difference between industrial and agricultural areas, attend- 
ance being less complete in the latter. From 1940 to 1950, there was 
some improvement in completeness of attendance and there may be 
more, particularly in the agricultural areas, between 1950 and 1960, 
but it will probably be slight. It is assumed, for the present, that the 
character of each county with respect to completeness of attendance is 
stable. 

One inconsistency has already occurred since 1950, however. The 
State legislature changed the age minimum for entrance into the first 
grade from 5 years 6 months to 5 years 9 months, effective in the fall 
of 1952. It is necessary to estimate the number of students excluded by 
the new rule. The estimate is made by reference to the corresponding 
births. 

In order to provide the school count in the form needed for current 
population estimates as of each January 1, the total counts of March 
and October are projected two months to January 1 of the next calen- 
dar year. This leads to a slight overstatement as there is a tendency 
for children to be held back during the spring and summer months and 
entered in a wave in the fall. Various other factors, of which seasonal 
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agricultural migration is the most important, also affect the two counts. 
Each year a preliminary school count is produced by the March through 
October projection and each following year a revision is produced by 
differencing to January between the prior October and the new March 
counts. This is more accurate and minimizes distortions due to seasonal 
agricultural migration and the tendency to hold new students back for 
the fall session. Preliminary population estimates based on the two- 
months projection are revised the following year according to the re- 
vised school count. 

Several years of experience in relating the projected to the revised 
school counts now make possible the determination of adjustments for 
seasonal variations in counties where the pattern is fairly stable. These 
are applied to each new projection of the preliminary count to make it 
more closely approximate what the later revision will be. 


CENSUS RATIO 


The Census ratios were calculated by dividing the 1950 Census popu- 
lation of each county by the 1950 spring school counts. These ratios 
were studied by arraying them from smallest to largest and differentiat- 
ing them into six groups. They range from Madera County with a 
ratio of 5.58031 to San Francisco with a ratio of 12.49347. This array 
gives a basic distribution for a type of testing called “pattern testing.” 
The six groupings are arbitrary but are based on demographic and 
economic characteristics of the counties. Group 1 consists of the four 
major urban-industrial counties. Groups 2 and 3 are semi-urban coun- 
ties with mixed industrial and agricultural economies, Group 2 being 
in the San Francisco Bay Area and Group 3 in Southern California. 
Group 4 is Sacramento Valley counties, and Group 5 is San Joaquin 
Valley counties, all primarily agricultural but with important indus- 
trial developments in spots. Group 6 is all others. 

At the top of the table with the lowest ratios are the counties with 
relatively the greatest number of children in the elementary school ages 
while at the bottom of the table are those with the least. This basic 
characteristic derives, of course, from the age distribution at the time 
of the Census, which derives from the level of the birth rate six to 
fourteen years earlier, which, in turn, derives from the economic and 
social character of the county. The top of the table is the extreme of 
ruralness, agriculturalism, and youthfulness of the population while 
the bottom is the extreme of urbanness, industrialization, and aging 
of the population. 

The way the groups fall in this demographic array, the sequence of 
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the counties within each group, and the interlacing of the groups pro- 
vide patterns which are of importance in spotting changes in demo- 
graphic character when “pattern testing” is applied to other materials 
used in the estimating process. The research statistician becomes famili- 
ar with these patterns and can recognize deviations from them or 
variations of them. In terms of principle, where there are no absolute 
yardsticks or standards of what this or that demographic characteristic 
should be, significant information may be derived on a relative basis 
from interrelationships among the counties. 


RATIO CHANGE FACTOR 


The Census ratios—which, again, are the ratio of the total population 
to pupils in elementary school grades in April, 1950—may be said to 
measure the relative density of children in that age bracket within the 
county populations. If the ratios remained constant, population esti- 
mating based on school counts would be simple, but, as already stated, 
they do not. They change continuously and “ratio change factors’ 
are necessary to compensate. 

The ratio change factors are produced from birth rates. The birth 
rate of a specific year reflects the relative number of children born into 
the population in that year and that relative relationship continues 
sufficiently unchanged during the six to fourteen years the children 
are in elementary school to make its use in this application practicable. 
For working purposes, the children who were in school in April, 1950, 
are considered as having been 6 to 14 years old and as having been born 
during the years 1936 to 1944, inclusive. This is not exactly true but 
it is practical since it permits use of annual totals of births and pre- 
cludes the necessity for totaling births by 12-month periods including 
parts of calendar years. Moreover, errors resulting from this inaccuracy 
are compensated for in ways still to be explained. The age span of six 
to fourteen years is used because it gave better results when checked 
against the 1940 Census data than the six to thirteen year spread. 
Consequently, it was adopted for use during the 1950-60 decade. 

For each school year, the birth rates for the nine years six to fourteen 
years earlier are added together to give a specific sum for that year. 
Thus, adding the birth rates together for 1936 to 1944, inclusive, gives 
a total of 152.50 for July, 1950. The birth rates are considered to be as 
of July 1 since they are computed from place-of-residence births for 
each calendar year and an estimate of population as of July 1. The 
purpose of these figures will become apparent by reference to Table 1. 
Note the total of 152.50 for July, 1950 in column 3. This is the total of 
the birth rates for 1936 through 1944 as shown in column 2. 





TABLE 1 
CUMULATED BIRTH RATES, CALIFORNIA 








Cumulated Rates, 6-14 Years Earlier 





Birth Rate 
Ratio* 








Years Cumulated 
Year Rate Included Rates 1940 1950 
Base Base 





(1) (2) (3) (4) (5) (6) 
1925 17.60 

1926 16.34 

1927 16.32 

1928 15.62 

192914. 


1930 14. 
1931 
1932 13. 
1933 12. 
1934 


1935 13. 
1936 13. 
1937 14. 
1938 15. 
1939 15. 1925-33 135.50 


1940 132.08 
1940 16. 1926-34 130.94 
1941 17. 1927-35 127.78 
1942 19. 1928-36 124.97 
1943 20. 1929-37 124.02 


1944 19. 1930-38 124.59 
1945 19. 1931-39 125.02 
1946 22. 1932-40 127.05 
1947 1933-41 131.04 
1948 1934-42 138.29 


1949 1935-43 145.70 
1950 150.80 
1950 1936-44 152.50 
1951 1937-45 158.49 
1952 1938-46 166 .50 


1953 1939-47 176.03 
1954 1940-48 184.65 
1955 1941-49 192.23 





* Base period cumulative rates divided by those of each year. 
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We may regard the sum of 152.50 as an indicator of relative density of 
children of elementary grade ages in the population of California as of 
July, 1950. The sum for July, 1949 is 145.70. Proportioning the differ- 
ence between the 1949 and 1950 figures gives a sum of 150.80 as of 
April 1, 1950, the date of the Census. This, then, is the base for the 
1950-60 decade. Note next that the sum for 1954 is 184.65. This demon- 
strates that the relative number of children of grade school age in the 
population was substantially greater in 1954 than it was in 1950. For 
population estimating purposes, it means that the 1950 Census ratio 
must be reduced to compensate for the change in relative density of 
grade school children. The reduction is made by calculating the ratio 
of change to each succeeding year from the base year and multiplying 
the product of school count times Census ratio by its reciprocal. For 
working purposes, we skip one division and calculate the ratio of the 
base year to each succeeding year directly. Thus, 150.80/184.65 
=.81668. Consequently, estimates made by multiplying the 1954 
school count by the 1950 Census ratio must also be multiplied by 
.81668 to correct for change in the ratio between population and grade 
school count. 

The series with 1950 as the base year for use during the current 
decade is shown in column 6. The density of children of grade school 
ages within the population is now rising, as we know it must because of 
the much publicized “birth wave.” If the birth rates, the cumulated 
rates, and the ratios to the base period are all plotted on the same 
chart, these relationships and the shapes of the curves become apparent. 

Census ratios and the cumulated birth rates are both measures of 
the relative density of children of elementary school age within the 
population. The Census ratios are one-time items, however, while the 
cumulated rates are a continuous series. It is possible, therefore, using 
the cumulated rates series, to calculate the ratio between a base year 
and any specific year and the reciprocal of this ratio, is then, an index 
of change of density which can be applied to the Census ratio. 

It is clear that the ratio change factor must be a multiplier which can 
vary above and below unity, so that in a phase following increasing 
birth rates, such as the present, it can decline and fall below unity 
while in a phase following decreasing birth rates, such as during the 
1940’s, it can rise and become greater than unity. The ratio to base 
period of the cumulative birth rates meets this requirement. It rises 
and falls pursuant to birth experience six to fourteen years earlier. 

Ratios with 1940 as the base (actually in reciprocal form, of course) 
are shown in column 5. The cumulated rates as of April, 1940 were 
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132.08. In 1943, the density of children was at its lowest, the sum was 
124.02 and the corresponding change factor was 1.06499. After 1943, 
this multiplier diminished until it passed through unity (the point of 
zero change) in 1948, and for April, 1950, it was .87586. Now, if we 
were to check the accuracy of this method of estimating the population 
of each county against the 1950 Census by using 1950 school counts, 
the 1940 Census ratios and correction factors, the county counter- 
parts of this statewide factor would be the correction factors to use. 
Analysis indicates that these ratios do, in fact, come very close to 
being what such correction factors should be, but it is impossible to 
make a conclusive study because the method of counting children in 
school was changed in 1947. Nevertheless, analysis demonstrates that 
ratios calculated from cumulated birth rates come close enough for 
practical purposes. 

The series indicated in column 6 of Table 1 has been calculated for all 
of the 58 counties to produce a ratio change factor for each county 
each year beginning with 1950. Ratios for 1954 are shown on Table 2 
which distributes the counties in groups as mentioned earlier under the 
term of “pattern testing.” The table, in essence, shows relative change 
in the Census ratios from 1950 to 1954. Every county has its own 
characteristics, its own relative density of children and, hence, its own 
degree of change. As with the Census ratios, there is a wide range from 
highest ratio change factor to lowest. 

It is of interest to note that the more urban type of county runs at 
the bottom of the scale on Table 2 while the more rural type runs 
toward the top. This means that the Census ratios of the urban-in- 
dustrial type counties must be more sharply discounted than those in 
the rural-agricultural type because the increase in density of children 
due to the rise in the birth rate from its previous levels is relatively 
greater in the urban counties than in the rural ones. 

In using the ratio change factors, it is necessary to make the under- 
lying assumption that the people who come into a county through 
migration are demographically similar to those already there. It is 
thus assumed that city people migrate to San Francisco and farm 
people to Madera county. While this will usually be true, it is not true 
always and the demographic character of a county may change. This 
fact is recognized and necessary adjustments are provided in testing 
procedures which will be explained presently. 

In dealing with materials of this type, one must be constantly aware 
that any specific item for any county in any year may be off-normal. 
The ratio change factor for a particular county may tend to be high or 
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TABLE 2 
RATIO CHANGE FACTORS, 1954 








County sens 
tio 

Change 

Factors 


E 





Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 





Sierra 1.17541 
Alpine 1.11575 
Mono 1.09705 
Lassen 1.03972 
Nevada 1.00876 


Mariposa 97754 
EI! Dorado 

Calaveras 

Plumas 

Amador 


Shasta 
Siskiyou 


CaoONoO GPW e 


Imperial 


Tuolumne 


Monterey 
Del Norte 


Sonoma 
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low in a certain year and the pattern testing, as shown on Table 2, 
helps identify such occurrences. As stated earlier, there are no absolute 
standards for these factors, but they may be evaluated relative to each 
other within the group patterns, a county being known by the com- 
pany it keeps. Further, this type of scanning serves the double purpose 
of defending the population estimates and providing confidence for 
them. For example, by itself, it might seem, that the ratio change factor, 
for San Mateo County (56th line) is too low, but material confidence 
is provided by noting the proximity of those of Santa Clara, Contra 
Costa, San Francisco, Alameda, Solano, and Marin Counties. 

It is obvious that accurate ratio change factors require very accurate 
birth rates, which in turn require very accurate population estimates 
six to fourteen years earlier. It is considered possible to prepare inter- 
censal estimates which are substantially accurate. In 1953, the Cali- 
fornia Taxpayers’ Association re-estimated the population for each 
county for the intercensal years from 1940 to 1950 (also 1930 to 1940), 
using an adaptation of procedures outlined by Donald J. Bogue [1]. 
Estimated intercensal county populations as of January 1 were pub- 
lished [4]. Estimates as of each July 1 were also computed (and mimeo- 
graphed but not published); new birth rates were calculated for each 
year from 1935 onward; and, new ratio change factors were produced 
from them. 

Ratio change factors are for the most part rooted back in intercensal 
years for which data should be fairly sound. By 1960, the years included 
will be 1946 to 1954, the last four of which will depend on four years of 
postcensal estimates of population. One, we trust, will not go too far 
wrong in four years, but any errors in population estimates of past 
years will be “inbred” into the change factors. Again, testing still to be 
discussed is designed to offset such errors. 


CONVERSION FACTOR 


The most difficult factor to determine is the “conversion factor.” 
It is so called, as already stated, because it converts the ratio change 
factors into correction factors. 

Necessity for the conversion factor is demonstrated on Table 3 using 
four years of presumably firm experience prior to the 1950 Census. 
Column 1 shows the school counts as of January 1 for the four years of 
1947 to 1950. Column 2 shows the appropriate ratio change factor and 
column 3 shows the product of the school count times the Census ratio 
times the ratio change factor. The products do not quite agree with 
the intercensal estimates of column 4 so we divide the intercensals by 
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the products to determine the factors by which the products must be 
multiplied to equal the intercensal estimates. These are shown in column 
5 and prove to be a slowly ascending nearly straight line series. 
Entering these factors on the aforementioned chart showing birth 
rates and ratio change factors, assists in visualizing the relationships, 
The birth rates (column 2 of Table 1) declined slowly in a roughly 
straight line to a low in 1933, then rose rapidly to a high in 1947, 
after which they again declined slowly to 1950. With the birth rates 
rising rapidly in this way, it seems not unreasonable that the ratio 


TABLE 3 
CALCULATION OF CONVERSION FACTOR, CALIFORNIA 








Ratio 
Change Product* 
Factors 


Intercensal Conversion 
Estimate Factorst 


School 


Date Count 





(1) (2) (3) (4) (5) 
January 1947 1,020,218 1.15079 9,989,660 9,690,560  .97006 
January 1948 1,086,317 1.09046 10,079,245 9,915,820  .98379 
January 1949 1,165,313 1.03500 10,262,298 10,184,600  .99243 
January 1950 1,237,263  .98885 10,410,082 10,505,896 1.00920 





* School count times 1950 Census ratio of 8.50867 times ratio change factor. 
t Column 4 divided by Column 3. 


change factors derived from them would be affected in a manner analo- 
gous to the way arithmetic averages are affected by inclusion of extreme 
items. Thus, the inclusion of new years with very high rates causes 
overstatement of increase of density of children of grade school ages. 
Estimates produced with these ratio change factors are too low anda 
compensating adjustment is needed. A similar correction in the other 
direction would become necessary should the birth rate turn rapidly 
down. Under nearly all conditions other than a stable birth rate, a 
correction will be needed. The conversion factor—one single statewide 
multiplier applied in common to all the county ratio change factors for 
a specific year to raise or lower all alike—is designed for this purpose. 

Determination of conversion factors for years subsequent to the 
Census involves serveral different lines of attack all brought to bear on 
the one central problem. 


1. Extrapolation 


The four years shown on Table 3 serve as a base for projecting the 
conversion factors and the initial step is simply graphic extrapolation 
of the charted line. The projected values are used to produce estimates 
of population for the state as a whole year by year subsequent to the 
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Census and those estimates are checked to see if the conversion factors 
are tenable. Latest experience in what is called “the 1954 cycle,” dur- 
ing which the estimates as of January 1, 1955, were produced together 
with revisions for previous years [2] showed that findings stated in 
Table 3 still hold good with minor variations to 1955. Conversion fac- 
tors used in the “1954 cycle” were as follows: 








Conversion Change in 


Year Factors Year 





Intercensal experience: January 1947 -97006 
January 1948 - 98379 -01373 
January 1949 .99243 .00864 
January 1950 -00920 -01677 

Postcensal projection: January 1951 .015 -00580 
January 1952 .025 -01000 
January 1953 .030 -00500 
January 1954 .040 -01000 
January 1955 .045 -00500 





The slight lowering from a strictly straight line projection is sup- 
ported by collateral findings. 


2. Sum of Birth Rates 


The birth rates dropped and picked up in each successive year’s 
sum of cumulated birth rates are reviewed and their effect on the con- 
version factor considered. On this score, a very slight dropping below 
the straight line projection appears reasonable. 


3. Evaluation of Growth Components 


Growth derives from natural increase and net migration. Statistics 
are available on births and deaths by place of residence but not on in- 
and out-migration. In order to show historical patterns of net migra- 
tion, California Taxpayers’ Association made a study of population, 
natural increase, and net migration for the state and each county for 
the 24 years from 1930 to 1954 [3]. 

It seems reasonable to hypothecate that the volume of net migration 
is responsive to general economic conditions in the nation at large. 
To affirm or refute the hypothesis, the net migration from the twenty- 
four year study was related to eleven of the statistical series identified 
by the National Bureau of Economic Research as “statistical indicators 
of cyclical revivals and recessions” [12]. The specific series used were 
Percent of Labor Force Unemployed (both National and State), 
Failure Liabilities, New Manufacturing Orders (Durables), Bank Deb- 
its Outside New York City, Industrial Production, Wholesale Prices 
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other than Farm Products and Foods, Gross National Product, Aver- 
age Weekly Hours in Manufacturing, New Incorporations, Non-Agri- 
cultural Employment, and Freight Car Loadings. These are selected 
from the “leading” and “concurring” groups of statistical series. Over 
the long run, the hypothesis that migration is correlated with economic 
activity appears supportable. This conclusion, however, depends again 
not so much on “absolute” considerations of direct linkage as on “rela- 
tive” considerations of turning at the same time in the same direction 
and being “less than” or “more than” a reference period. 

Components of growth since 1940 as developed during the 1954 cycle 
are shown on Table 4. The natural increase has risen steadily except 
for 1948. The net migration has ranged from the wartime gain of some 
570,000 in 1942 to the postwar low of about 76,000 in 1947. Gains 
during 1952 which were linked to the rearmament program for the 
Korean War almost equaled the 1942 gains but were not followed in 
1953 by gains corresponding to those of 1943. 

In determining the conversion factors for the 1954 cycle, the net 
migration they produced through 1953 was taken into consideration. 
After receipt of the October 1954 school count and completion of all 
calculations for estimates as of January 1, 1955, it was found that the 
produced net migration for 1954, shown on the first line of Table 4, 
was slightly greater than during 1953. At that time (December, 1954), 
the first half to three quarters of 1953 were available for the eleven 
“leading” or “concurring” economic indicators. The preponderance of 
evidence indicated a level of economic activity slightly below 1953. 
For consistency of our hypothesis, then, net migration in 1954 should 
be a little less than in 1953 rather than a little more. As of the date of 
writing (January, 1955) there is evidence of economic revival during 
the latter part of 1954. Without elaborating, let us merely observe 
that when the record on 1954 is complete, slightly greater net migration 
may not be an inconsistency. 


4. Border Agricultural Inspections 


All autos entering California are stopped at the border agricultural 
inspection stations. Cars with non-California licenses and their pas- 
sengers are tabulated. Quarterly surveys determine reason for entering 
California such as “vacation,” “moving to California,” etc. With these 
and other data on travel, it is possible to construct an approximation 
of in-migration to the state. There is rough correlation with net migra- 
tion but it takes some interpreting. Again, 1954 travel to the date of 
writing was a little below the 1953 volume. 
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TABLE 4 


COMPONENTS OF POPULATION GROWTH FOR CALIFORNIA, 
1940 TO 1954 








Components of Growth During Year 





Calendar 
Year Total Natural Net 
. Increase Migration 





1954 455,400 E 197,105 258 , 295 
1953 418,600 187 ,431 231,169 
1952 712,900 172,898 540 ,002 
1951 465 ,000 156 ,489 308,511 


1950 276 ,070 145 ,874 130 , 196 
1949 271,296 144,615 126 ,681 
1948 268 , 780 141 ,093 127 ,687 
1947 225 , 260 148 ,577 76 ,683 
1946 231,100 122 ,866 108 , 234 


1945 307,165 90 , 236 216,929 
1944 406 , 200 88,321 317,879 
1943 602,625 85,311 517,314 
1942 640 ,370 69,716 570 , 654 
1941 391,300 44,247 347,053 
1940 243 ,272 32,545 210,727 





E—Estimated. 


5. Comparison with Estimates by other Agencies 


Estimates for recent years are compared to estimates by other 
agencies, principally those by the United States Census Bureau [14, 
15] and the California State Department of Finance [8]. Latest estimates 
of total California population, adjusted to July 1, (with percentage 
differences regardless of sign from the California Taxpayers’ Associa- 
tion estimates in parentheses) are as follows: 








Calif. Tax Bureau of Calif. Dept. 
Assoc. the Census of Finance 





July 1,1951 11,014,500 11,038,000 (0.2%) 11,115,000 (0.9%) 
July 1,1952 11,603,450 11,758,000 (1.8%) 11,612,000 (0.1%) 
July 1,1953 12,169,200 12,190,000 (0.2%) 12,075,000 (0.8%) 
July 1,1954 12,606,200 12,554,000* (0.4%) 12,450,000 (1.3%) 





* Received after completion of California Taxpayers’ Association estimates for January 1, 1955, 
but included for comparison. 
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Such close agreement may be regarded as significant confirmation 
for all three series. 


6. Estimates of Population from Mortality Data 


Independent estimates of the total population of the state for recent 
years are made using age-specific deaths and death rates. This involves 
extension of death rate experience. Estimates of the age cohorts up 
through age 14 are the most erratic and these are confirmed or sup- 
planted by estimates of those cohorts from birth and school data. 
This in turn gives some current information on current death rate 
levels and the continuation (or non-continuation) of trend for these 
cohorts. Estimates by mortality vary from estimates based on school 
counts but do contribute to the over-all context for determining the 
conversion factors. 


7. Grade to Grade Progression in Elementary School 


This line of study under stable conditions contributes to evaluation 
of gains by net migration but is currently complicated by the change of 
entry age for grade 1 already mentioned. 

These and other lines of attack are continuing projects to help reduce 
the element of judgment in determination of the conversion factors and 
replace or support it with research considerations. Each year the com- 
ponents for the most recent years are revised and refined and from them 
as a base the conversion factor for the next year is projected. 

Note that among other things the conversion factor compensates, 
first, for the half-year discrepancy between the date of the ratio change 
factors, which is July 1, and the date of the population estimates, 
which is January 1, and, second, for the error resulting from the use of 
annual totals of births in computing birth rates for the ratio change 
factors rather than their accumulation by months to agree exactly 
with the six to fourteen year age span at the Census date. 


TESTING AND ADJUSTMENTS 


Referring back to the formula, we are now down to the item “plus 
or minus adjustments.” As already stated, one must always recognize 
that the ratio change factor for any county in any year may be in 
error because population estimates for one or more of the included 
years may have been wrong, or births may have occurred at an ab- 
normally high or low rate in some years. Can we, then, bring into play 
some independent element to test the accuracy of the estimates of 
population of individual counties as represented by the “raw product” of 
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school count times Census ratio times ratio change factor times con- 
version factor? 

For this purpose a testing procedure is used involving four statistical 
series—place-of-residence births and deaths, auto registrations, and 
voter registrations. In form, the test is to see if a change in the relative 
portion that a county’s population comprises of the state’s population 
is matched by a change in its portion of the state’s total of these four 
series. Again, any one of the four test series may be off-normal for a 
particular county in a particular year, so confirmation of the change for 
two consecutive years is required. For identification, this is referred to 
as “vertical series testing.” The name derives from the columns show- 
ing each of the test series year by year converted to percentages of the 
state total. 

This testing is most easily explained by illustration. In April, 1950 
Los Angeles County had 39.218 per cent of the state’s total population, 
and in January, 1954, according to the unadjusted estimate (the “raw 
product”), it had 40.856 per cent representing an increase of 1.638 
percentage points. Since we have estimated that Los Angeles had a 
larger portion of the state’s population in 1954 than in 1950, it is 
reasonable to assume it should have produced a correspondingly larger 
portion of the state’s births. In 1950 Los Angeles had 36.411 per cent 
of the state’s births and in 1954, 37.170 per cent, an increase of .759 
percentage points. Thus, we find that Los Angeles is, in fact, credited 
by the State Bureau of Vital Statistics with having had a larger portion 
of the births, but the reported gain in the county’s portion of births, 
.759 percentage points, is not as great as the estimated gain in portion 
of the population, 1.638 percentage points. Does this mean that the 
population estimate as represented by the “raw product” is too high? 
Perhaps, but before accepting this interpretation, we should see if it is 
confirmed by the other series similarly treated over two or more con- 
secutive years. The differences in percentage of total from 1950 to the 
indicated years were as follows: 








Los Angeles County 1953 1954 





Population—calculated 1.079 1.638 
—adjusted .679 1.238 


Births 759 1.322 
Deaths .679 —1.452 
Autos - 760 1.278 
Voters Not Available -075 
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The positive differences in Los Angeles’s portion of the total state 
population of 1.079 in 1953 and 1.638 in 1954 were greater than the 
differences in all of the seven test items, hence, the larger fraction of 
the state’s population is not supported by correspondingly larger frac- 
tions of the total produced by the county in each of the four series, 
In other words, the test indicates that the “raw product” estimates of 
the population of Los Angeles County for the two years are too high. 
Reducing the calculated population of Los Angeles County by 0.4 
per cent of the state total population for both years gives the “ad- 
justed” differences shown above. The adjusted differences are approzi- 
mately central among the test differences. If we, then, adjust the estimate 
for Los Angeles County by deducting 0.4 per cent of the state total, 
which was about 50,000 in 1954, the gain in portion of population will 
be more closely supported by the gain in portion of births, deaths, 
voters, and autos actually tallied for the county. 

In the “1954 Cycle” adjustments up or down were applied to the 
“raw products” of 22 counties. They ranged from .4 points in Los 
Angeles County down to .01 points in a few of the smaller counties. The 
total of the adjusted estimates for the counties, after rounding, was 
taken as the estimate of population for the state. Where no clear indi- 
cation of need for an adjustment occurs, it is assumed the population 
figure for a county is about right or decision is suspended until the 
next annual cycle. 

It will be recognized that the vertical series test procedure is differ- 
ent from plain pro-rating. As pointed out in the discussion of the 
Census ratio, Madera County has a ratio less than half that of San 
Francisco because its birth rate is so much greater. On a simple pro- 
rata basis using births, Madera County would be given too large an 
estimate of population and San Francisco too small an estimate. Every 
county has its own peculiar character and its own specific production 
of births, deaths, voters, and auto registrations. Using the differences 
from the vertical series avoids the biases of pro-rating, but involves the 
assumption that each county’s own peculiar character continues from 
year to year. This assumption is weakest with respect to voter regis- 
trations since local issues may affect registrations rather markedly. 

Part of the process of determining adjustments from the vertical 
series testing is to refer back to the pattern testing of the ratio change 
factors. If a “raw product” appears high according to the vertical 
series test, a check is made to see if the corresponding ratio change fac- 
tor appears high on its scale. Vertical series adjustments may be made 
without these confirmations, but in many instances there is confirma- 
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tion. Checking out the vertical series tests for all the counties and 
relating them to the pattern tests is laborious, but it is believed that 
the resulting adjustments are in the right direction and consequently 
reduce margins of error. Thus, the elements used to estimate each 
county’s population are not evaluated singly in isolation but only in 
context with those of all the other counties. Through these tests, 
changes in demographic or economic character can be onteniee. 


COVERAGE 


Estimates produced by this method are of total resident population. 
No effort is made to deduct military personnel because data on military 
personnel are usually classified and current estimates of total popula- 
tion fulfill many uses without separating the military. Using the pro- 
cedures here outlined, it is obvious that it would be necessary to 
restrict grade school counts to the children of the civilian population 
alone in order to produce estimates of civilian population. That, of 
course, is impossible. 

The 1950 Census counted individuals on military and naval posts 
where assigned. This included both individuals permanently assigned 
to the bases such as administrators, instructors, operators, etc., as 
well as those temporarily assigned, such as trainees being organized 
into units. Hence, the 1950 Census ratios included civilians together 
with permanently assigned military personnel and a corresponding load 
of temporarily assigned. During times of rapid military expansion, as 
in 1951 and 1952, permanently assigned personnel may carry relatively 
larger loads of temporarily assigned than in April, 1950, when military 
establishments were at a low ebb making the 1950 Census ratios to 
the extent of the extra load not representative. 

Californians temporarily assigned in a county in the state other than 
that of residence, or assigned outside the state on military duty are in- 
cluded in estimates made by this procedure because their families 
remain in the county of residence and are reflected in the school counts. 
Californians assigned within the state but outside their county of 
residence may offset one another somewhat. Also Californians as- 
signed outside the state and non-Californians assigned in the state 
may offset one another. The net errors are probably small in proportion 
to the total population of most counties. It could be appreciable in 
some, however. The error, when there is one, is probably that of not 
fully reflecting temporarily assigned military personnel. This does not 
preclude use of the estimates for many practical purposes and while 
the military “float” may mean business for local amusements, personal 
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services, and retailers, it is perhaps better discounted for evaluating 
real growth and for many research purposes. 


TESTS OF ACCURACY 


Comparison of estimates of population as of April 1, 1950 with the 
1950 census for the twenty-two larger California counties is shown in 
Table 5 [5]. California has 58 counties, of which the first 22, containing 
91 per cent of the state’s population, are arbitrarily regarded as 
“larger” because of their general character. 

Of the 22 “larger” counties, 14 had errors under 5 per cent, leaving 
eight with errors over 5 per cent, of which one, Monterey, had an 
error exceeding 10 per cent. The error of estimate for the state as a 
whole was 0.9 per cent and error of estimate for the 22 counties as a 
whole was 0.3 per cent. Among the 36 “smaller” counties, 20 of which 
had fewer than 20,000 inhabitants in 1950, 11 had errors less than 5 
per cent. Of the 25 with errors exceeding 5 per cent, 15 had errors ex- 
ceeding 10 per cent and 4 exceeding 25 per cent. 

In order to compare the estimates with some “bench mark” meas- 
ures of accuracy in population estimating, we may compare the per- 
centage deviations with percentage deviations of estimates for states 
and cities made for test purposes by the Bureau of the Census using 
various techniques [10]. Comparisons are shown in Table 6. Average 
deviations for the “larger” counties are slightly greater than those of 
the Census Bureau for states using “Migration and Natural Increase- 
Method II.” Deviations of the “smaller” counties and of all 58 counties 
as @ group are greater than those reported for states by the Census 
Bureau for any method. 

It is important to know in this connection that the basic statistical 
series—count of enrolment in grades 1 through 8—was changed in 
1947 from the former statistically inferior series known as “State 
Enrolment” to the current statistically excellent series of counts of 
active enrollments on each March 31 and October 31. As a result of this 
mid-decade change, it was necessary to use estimates based on correla- 
tions between the old and the new school counts in setting the 1940 
Census ratios. Tests—which need not here be explained in detail— 
indicate that the large errors among the “smaller” counties were due 
substantially to errors introduced by the 1947 change in counting 
method. For the current decade we have the improved school counts 
together with improvement in others of the materials used for popula- 
tion estimating. It is not unreasonable to assume, therefore, that 
results in 1960 will be as good as and probably better than they were 





ESTIMATING POPULATION OF COUNTIES 341 


TABLE 5 


COMPARISON OF POPULATION ESTIMATES OF APRIL, 1950 
BY CALIFORNIA TAXPAYERS’ ASSOCIATION 
WITH 1950 CENSUS FIGURES 








Census, April 1, 1950 





Deviation of 
Per Cent C.T.A. Estimate 
of State Estimate, from Census 

Population Total April 1, 1950 
(Cumula- 

tive) 





Persons Per Cent 





Los Angeles 4,151,687 39.22 4,141,600 -—10,087 
San Francisco 775 ,357 46.54 778 ,600 3,243 
Alameda 740 ,315 53.53 761,200 20 ,885 
San Diego 556 ,808 58.79 567 ,300 10 ,492 
Contra Costa 298 , 984 61.61 300 , 300 1,316 


ornwoo 
He © CO m bo 


Santa Clara 290 , 547 64.35 288,100 —2,447 
San Bernardino 281 ,642 67.01 287 , 500 5,858 
Sacramento 277,140 69 .63 289 , 500 12,360 
Fresno 276,515 72.24 290 , 900 14,385 
San Mateo 235 ,659 74.47 233,700 —1,959 


oa rf Nw OO 
CO nN Ore C 


Kern 228 , 309 76 .63 223,100 — 5,209 
Orange 216,224 78 .67 201,200 —15,024 
San Joaquin 200 ,750 80 .57 204 ,900 4,150 
Riverside 170 ,046 82.18 166,000 — 4,046 
Tulare 149 , 264 83.59 159 , 900 10 ,636 


IN tO bd 
— eH © 


Monterey 130 ,498 84.82 109,600 —20,898 
Stanislaus 127 ,231 86.02 131,500 4,269 
Ventura 114,647 87.10 103,500 —11,147 
Solano 104,833 88 .09 110,100 5 , 267 
Sonoma 103 , 405 89.07 108 ,500 5,095 


l 
— 


| 
mI OO 
oLoNRo 


Santa Barbara 98 ,220 90 .00 89,800 — 8,420 
Marin 85 ,619 90.81 92 ,400 6,781 


| 
“1 00 
Oa 











°o 
ow 


Sub-Total 9,613,700 90.81 9 ,639 , 200 25 ,500 


Rest of State 972,523 9.19 1 ,037 ,030 64,507 








ele 
orn 


STATE TOTAL 10,586,223 10 ,676 , 230 90 ,007 
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TABLE 6 


COMPARISON OF PERCENTAGE DEVIATIONS OF ESTIMATES OF 
CALIFORNIA COUNTIES BY CALIFORNIA TAXPAYERS’ ASSO- 
CIATION METHOD FROM THE 1950 CENSUS WITH 
DEVIATIONS OF ESTIMATES FOR STATES AND 
CITIES BY SELECTED METHODS 








Quad- Devia- Devia- 

ratic tions tions 

7 Mean of 10% of 5% 

Devia- Devi 

poring evia- or or 
tion More More 


Posi- 
tive 
Devia- 
tions 


Aver- 





California Taxpayers’ Ass’n. 
Method: 
California Counties 
“larger” (22) 
“smaller” (36) 
all (58) 


Other Methods :* 
States (49) (Adjusted to 

national total) 

Migration and Natural In- 

crease 
Method I 
Method II 

Vital Rates 

Arithmetic 

Geometric 


Cities (92) 
Migration and Natural In- 
crease 
Method I 
Method II 
Vital Rates 
Arithmetic 
Geometric 





* Source: See reference note 10. 


in 1950. There is no way of judging what degree of improvement for 
both “larger” and “smaller” counties will result from the better com- 
ponents of this decade but it appears, both on the basis of rationale 
and possible results, that the procedure warrants continued use and 
development. 
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THE MEMORY FACTOR IN SOCIAL SURVEYS 
Percy G. Gray, Government Social Survey, London 


This article considers the problems of framing questions, 
the answers to which depend on the informants’ memories, 
The problems are illustrated with examples from the British 
Survey of Sickness. In an experiment in which a group of gov- 
ernment employees were asked to recall what sick leave they 
had taken and when, the predominant error consisted of for- 
getting when the leave occurred rather than forgetting it 
completely. The results of checks on other memory questions 
show the importance of carefully defining and redefining for 
the informant the period of time a question covers. 


1. THE PROBLEM 


O OBTAIN accurate answers to many of our questions in social sur- 
pipe we have to depend very much upon the memories of our 
informants. In framing such questions two queries immediately arise. 
Firstly, should we ask about a period fixed in time, e.g., a particular 
week, or should we ask about a period ending on the day before the 
interview, e.g., the last seven days? Secondly, how long shall the period 
be, a day, a week, a month, or what? 


The first query arises in this way. No matter how much one may 
wish to do so, one cannot interview all of a randomly selected sample 
of people on a given day. An appreciable proportion of people will not 
be at home when the interviewer calls. Furthermore, practical con- 
siderations usually dictate that the interviewing shall be spread over 
at least a week, probably longer. 

Thus if we choose the fixed period of, say, the week January Ist-7th 
then, if the interviewing is spread over a fortnight, the maximum 
period of recall demanded of informants will vary from 7 days up to 21 
days. (A simple analysis by date of interview will not necessarily show 
whether this has had any effect, since the groups interviewed on differ- 
ent days are not comparable. With a sample of adults, for example, 
higher proportions of housewives are likely to be interviewed in the 
early days of the interviewing period.) 

If, however, we ask about the seven days prior to the interview 
there are two difficulties. The most obvious is that different people 
will be asked about different periods of seven days. In many cases, 
there is little objection to this and one is prepared to treat the results 
as applying to an average week. The second difficulty, which should not, 
however, be exaggerated, is the possibility that there is some relation- 
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ship between the informant being found at home and the variable that 
one is studying. This may be troublesome when the memory period is 
reduced to “yesterday,” i.e., the day before the interview. Consider an 
individual who is out on alternate evenings. If the interviewer’s first 
call is made on an evening when the informant is out, she may well 
make an appointment for the following night when the informant would 
be in. The effect of this might be to over-estimate, say, evening expen- 
diture on entertainment when based on “yesterday.” 

These are the difficulties which face one when deciding whether to 
ask about a period of constant length which is also fixed in time, or a 
period of constant length but ending on the day before the interview. 
The next point we will consider is the length of period to be adopted. 

Let us suppose we are dealing with medical consultations and that 
we wish to calculate the average consultation rate for a four-weekly 
period. In the first place let us ignore some of the practical interviewing 
difficulties which we have already mentioned, and assume that we 
can interview as many people as we like on any day we care to choose. 
Then we might make all our interviews on one day and ask about the 
past four weeks. Or at the end of the first week we might question a 
quarter of the total sample about that week and question a second quar- 
ter a week later about the second week, and so on. Or we might sub- 
divide our sample into 28 parts, spread the interviewing over 28 days, 
and ask only about the day before the interview. 

At first sight it might appear that the standard error of the esti- 
mated four-weekly rate would vary as 1:4/4:4/28 for the three cases. 
But this is not so: although we obtain information about four times 
as many weekly periods in the first case as in the second, it is the same 
group of people who are reporting about four consecutive weeks, and 
there is a tendency for the same people to see the doctor in each week. 

This point can be illustrated by some data which are available for 
rather different periods. In Table 1 are shown the number of consulta- 
tions made at doctors’ offices by a sample of 3533 adults in the two 
consecutive calendar months of December and January. The first 
column gives a distribution of the consultations for December, the 
second column for January, and the third for the two months com- 
bined. 

Now if the same random sample of 1,000 persons is asked about 
their consultations in each of the two months, the standard error of 
the average for the two-monthly period will be 1.300/+/1,000 =0.0411. 
If, however, one sample of 1,000 persons is asked about December and 
another separate sample of 1,000 persons is asked about January, then 
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the error of the estimated two-monthly average is 
[ { (0.731)? + (0.811)?} /1,000]*/? = 0.0345. 


The practical decision one has to make is as follows. If one can only 
afford to make 1,000 interviews, should 500 be made at the beginning 
of January dealing only with December, and a further 500 at the begin- 
ning of February dealing only with January, or should all the 1,000 
interviews be made at the beginning of February and deal with both 
of the two months? The sampling error for the estimated two-monthly 
average with the first procedure as compared with the second, is in- 
creased by the factor (0.0345/0.0411)+/2, not /2 as would be approxi- 


TABLE 1 


PERSONS HAVING DIFFERENT NUMBERS OF CONSULTATIONS 
AT A DOCTOR’S OFFICE IN TWO CONSECUTIVE MONTHS 








December 
Number of consultations December | January and 
January 





3,015 2,763 
298 348 
190 
82 
78 
28 
12 


19 


0 
1 
2 
3 
4 
5 
6 
7 
8 
9 


10 
11 
12 
13 
14 
15 
16 
17 
18 
Total number of persons 3,533 
Average number of consultations 
per person in the period 0.238 
Standard deviation 0.731 
“1 














For the two months the correlation coefficient pi: = +0.42. 
The three standard deviations and the correlation coefficient are of course connected by the rela- 
tion—oi+s? = 6:2 +03? +2pi0103. 
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mately the case if there were no correlation. It is against this somewhat 
smaller increase that we have to weigh the possibly greater error due 
to memory, which may arise through asking people to remember their 
experience for two months instead of one. 

With many variables dealt with in survey work there is likely to be 
this positive correlation over consecutive periods of time. The correla- 
tion may, however, be quite different between, say, consecutive days 
and between consecutive weeks. Occasionally there may be a negative 
correlation such as is found in the example of Table 2. 

In this case the standard error of the two-monthly average which 


TABLE 2 


ANNUAL LEAVE TAKEN BY 461 PERSONS IN TWO 
CONSECUTIVE MONTHS 








July and 


July August August 





Average number of days per person 3.72 , 8.02 
Standard deviation 4.64 ‘ 5.70 


71+2 














For the two months the correlation coefficient pi: = —0.25. 


would be obtained from two separate samples of 1,000, each asked 
about one month only, would be greater, not less, than that for one 
sample of 1,000 asked about the two months. 

Thus far we have considered the gains likely to result from increasing 
the length of the memory period if there were no memory losses. But 
these gains have to be weighed against any loss in accuracy due to 
memory. Such loss can take two forms. In the first place, the variance 
of the remembered values may be greater than the variance of the 
true values. In the second place, the average of the remembered values 
may be biased. Even if there is no bias and the errors are fully com- 
pensating the variance may still be increased. The relation between the 
variance of the remembered values and that of the true values is as 
follows: 


of = of + oa? + 2praoria 


where 
o,=standard deviation of the remembered values 
o,=standard deviation of the true values 
oa=standard deviation of the discrepancies (the discrepancy =re- 
membered value—true value) 
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and 
pa:= correlation between the discrepancies and the true values. 


We see that the variance of the remembered values will exceed the 
variance of the true values, unless the correlation between the dis- 
crepancies and the true values (p4;) is negative and greater in value than 
oa/2c%. 

It is possible in the normal pilot inquiry to determine approximately 
the reduction in the standard error to be expected from increasing the 
length of memory period, but it is generally impossible to measure the 
error due to memory. Indeed, we have found it extremely difficult to 
obtain information about memory errors even on the main inquiries. 
Sometimes it is possible to compare totals or averages with national 
estimates that are available, but the results are seldom exactly com- 
parable and serve only to detect large errors. Asking questions about 
different periods is of limited value, since a comparison of the results 
obtained usually involves the assumption that the shortest period 
produces the smallest memory error. What is required is some experi- 
ments where the true and remembered values are known for each per- 
son questioned and such experiments are extremely difficult to arrange. 
In an attempt to learn something of the errors arising due to memory 


we did, however, devise a small experiment in November, 1951. But 
before describing the experiment something must be said of the Sur- 
vey of Sickness, since it was experience with this survey which deter- 
mined the lines of the experiment. 


2. THE BRITISH SURVEY OF SICKNESS 


The Survey of Sickness commenced in 1944 and continued with a few 
gaps until the beginning of 1952. Sampling and interviewing were car- 
ried out by the Social Survey, while the General Register Office dealt 
with the classification of the illnesses and analysis of the results. Dur- 
ing the first fortnight of each month a different sample of adults was 
questioned about their illnesses in previous months. Until September, 
1949, informants were asked about illnesses in the three cal-ndar 
months prior to the interview. From September 1949 onwards they 
were only asked about the two previous calendar months. 

Thus, prior to September 1949 the sample interviewed in the first 
fortnight of, say, January was asked: 

“Did you have any illness, during October, November, 
and December?” 

The interviewer then went on to find out for each illness or injury, 
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whether it was present, and how many days of incapacity and medical 
consultations it caused, in each of the three months separately. From 
the data which accumulated month by month it was possible to calcu- 
late monthly, quarterly, and annual rates. Clearly if N persons are 
interviewed each month and there is no memory error, then the 
monthly rates can be based on the experience of 3N different people 
and the standard error will be reduced to 1/4/3 times that which would 
be obtained if informants had been asked about one month only. In 
calculating annual rates, however, it has to be remembered that al- 
though there are 36N monthly experiences, these are not the experi- 
ences of 36N different people. The equivalent size of sample is rather 
less than 36N, due to the positive correlation existing between peoples’ 
experiences in the three consecutive months about which they are 
asked. 

As soon as the results became available for three consecutive samples, 
it was found that there were appreciable differences between the results 
obtained at different times after the month of experience. Some results 
taken from an early report [1] are given in Table 3. 

It will be seen that there is no difference for December in the number 


TABLE 3 


THE EFFECT OF DIFFERENT INTERVALS BETWEEN THE 
MONTH OF SICKNESS EXPERIENCE AND THE 
MONTH OF INTERVIEW 








Monthly incidence per 100 at risk 





Interviews 


$ Month of Sickness Experience 
conducted in: 


Type of illness 





Nov. Dec. Jan. 





January 
Serious illness February 
March 





January 
Influenza February 
March 





January 
Colds February 
March 























Note: The sample sizes were: Jan. 1,944, Feb. 2.399. Mar. 2.402. 
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of persons with a serious illness but that there are appreciable differ- 
ences for influenza and colds. ‘These findings were confirmed by later 
results and in giving the resujts for January-April, 1945, it was stated 
[2] that “the minor ailment rates . . . are based only on the two months 
preceding the interviews since the recollection of minor complaints 
three months ago is not reliable.” 

Beginning with the March quarter of 1947 onward the results of the 
inquiry were presented in a series of tables of more or less standard 
form in the Registrar General’s Quarterly Returns. With the exception 
of the first table giving certain overall rates, only data based on the 
two most recent months prior to interview were used. This one table 
continued to be based on three months’ experience until the end of 1948, 
when it was replaced by data relating to only the two months prior 
to interview. However, informants were still asked about their sickness 
experience in the three months prior to interview until the interviews 
made in September 1949. In this month the questionnaire and inter- 
view were modified so that informants were asked about only the two, 
not three, calendar months prior to the interview. This continued until 
the survey was suspended in March 1952. Plans were then afoot to 
reduce the memory period to one month only. This was to be done at 
first on half the sample only, in order to study the effect, if any, on the 
results for the most recent monwn. (Such a split sample was unfor- 
tunately not used in September 1949, when the reduction from the three 
month to the two month memory period was made.) 

During this time the only published comments at any length made 
on the effect of memory are by Stocks in his report “Sickness in the 
Population of England and Wales in 1944-1947” [3]. He concluded 
that “no advantage would accrue by basing morbidity rates on the 
last month’s experience alone . . . even if the number of people inter- 
viewed each month was so increased that the sampling error would 
be the same.” The argument! leading to this conclusion is rather in- 
volved and will not be reproduced here but subsequent examination 
showed that it was not soundly based. There was in fact no evidence 
in favor of a memory period of two months. 

As Stocks realized, it may be very difficult to say when an illness 
started; a person’s idea of the starting date may well change, quite 
apart from memory, as the ilJness progresses. A more definite point in 
time would seem to be the first medical consultation and it is medical 





1 The analyses presented in reference [3] are based on only the most serious illness of each person 
not allillnesses as they should be. 
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consultations which we shall consider. In Fig. 1 we give the variation 
in the number of medical consultations per person per month as re- 
ported at three different intervals after the month of experience. The 
top set of curves give the results for 1947 and 1948 according to inter- 
views made in the month following the month of experience, and also 
from interviews made one and two months later. The bottom set for 
1950 and 1951 consists of only two curves, as by then informants were 
only asked about two months, not three. (1949 has been omitted since 
the results for the third month are not available from October 1948, 
onwards while in September 1949, the memory period was reduced to 
two months.) It is at once apparent that the memory effect is not 
simply one of forgetting an increasing proportion of consultations as 
time goes by. It seems that the difference between the curves is 
affected by the way in which the actual rate is varying. For example, 
it appears that when the actual rate is falling sharply from a high 
figure, e.g., during the early parts of 1947 and 1951, there is a tendency 
for the curves to cross. 

This was the state of our knowledge in 1951. We suspected that most 
of the trouble arose because people were forgetting when consultations 
occurred rather than forgetting them completely. Ideally we would 
have liked to check each person’s answers with, say, his doctor’s 
records. Unfortunately, doctors do not in general keep sufficiently de- 
tailed records. We had therefore to fall back on the records of sickness 
absence and annual leave kept by a Government department. The 
funds and time available were a further serious limitation on what 
could be done. 


3. AN EXPERIMENT 


We chose for our experiment a group of Government offices where 
the records of annual leave and sick leave taken by the employees 
could be made available to us. Our investigators were supplied with 
forms similar to that shown in Figure 2. and a list of room numbers. 
On entering a room the investigator explained the purpose of the in- 
quiry, handed out a form to each person present, and gave brief in- 
structions. As the investigator collected the forms she asked the people 
present not to talk to anyone else about the experiment until lunch- 
time. On leaving the room the investigator added the room number to 
her list of rooms visited so that we could tell the order in which they 
had been done. All the forms were completed during a period of about 
two hours on the morning of November 15th. 

Our choice of the 15th of the month was made to simulate the worst 
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condition in the Survey of Sickness interviewing where interviews 
were normally spread over the first fortnight of each month. We would 
have liked to have varied the memory period covered, i.e., the number 
of months, but with the limited time and resources at our disposal 
we could not afford to split the sample. We settled on the four calendar 
months of July, August, September, and October, and the part of 
November as the period we would ask about. In getting people to fill in 
the forms we were, of course, departing from an interview proper, but 
this was the only way to get appreciable numbers of cases with the 
time and labor available. It seems doubtful however that an interview 
would have produced better results. 

Although we asked people not to talk about the inquiry, a few 


— 





PLEASE DO THIS FROM MEMORY 


EVEN IF YOU DON’T THINK YOU CAN REMEMBER CORRECTLY, 
WOULD YOU MAKE THE BEST ESTIMATE YOU CAN 


ANNUAL LEAVE SICK LEAVE 








ANNUAL LEAVE If you had no SICK LEAVE 

MONTH Number of days leave in any Number of days)| 
taken month taken 

write in “NONE” 


JULY JULY 








If you had more 
AUGUST than one period | AUGUST 
of leave within 
a month, please 
SEPTEMBER enter as follows: | SEPTEMBER 
1+-2, or 6+3, ete. 


OCTOBER OCTOBER 


























NOVEMBER NOVEMBER 
(to date) (to date) 


























LEAVE THIS BLANK 





Room Number 








Name (Block Capitals) 








Division 
































Fig. 2. Form used in the experiment. 
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certainly told their friends in other offices. In the few cases where 
people were known to have heard of the inquiry in advance, their 
forms were discarded. It is probable, however, that a small number 
of those who got completely correct forms had looked up the answers, 
The effect cannot be very great since there were so few all-correct 
forms. Furthermore, when the forms were divided into three groups 
according to how early they were completed, we find that there was 
no “ improvement” in memory as the morning wore on. 

The sick leave and annual leave taken by each person was later ob- 
tained from the department’s leave records and added to the forms, 
In addition to the period covered by the forms the leave taken in the 
earlier month of June was also recorded. There is no reason to doubt 
the accuracy of the leave records and, in what follows, it has been as- 
sumed that they provide the true values of the leave taken. During 
this process of adding the true values to the forms a few were found to 
be incomplete and a few were found where the informant had had 
special leave which he might, or might not, have counted. These were 
discarded. The remaining 433 forms provide the analyses which follow. 

In the case of annual leave there is an unusual factor in that each 
person is entitled to a certain definite number of days, varying with 
grade, during the twelve months commencing February Ist. Thus al- 
though a person might not know just when he had taken leave he 
might have a good idea of the total amount taken. Sick leave? does not 
have this disadvantage and will be dealt with first. 

Of the 433 people there were 205 who, according to the staff records, 
had not had any sick leave during the months in question. Of these 
192 recorded none on their forms. Of the remaining 13 there were 11 
who gave a figure for one month and 2 who gave figures for two 
months. 

There were 144 persons who had taken sick leave in one month 
only. Of these 59 recorded both the amount and month correctly on 
their forms, 25 gave the correct amount but the wrong month, while 
22 got the month right but the amount wrong. There were 24 cases 
where no leave was recorded, and there were 14 cases where leave was 
recorded in more than one month though it had only been taken in one 
month. 

Sick leave had been taken in two months by 63 people and of these 
11 got both periods correct, 21 got one period correct, while the remain- 
ing 31 got neither period correct. 

The remaining 21 persons had taken sick leave in three or more 





3 The detailed results can be obtained from the author. 
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months and of these 4 got all periods correct. The results for all 433 
persons may be summarized in that of the 205 who had taken no leave, 
192 gave the correct answers, while of the 228 who had taken some, 
only 74 gave completely correct answers. In view of this it is perhaps 
surprising to find that the average of the remembered values for the 
whole period differs from the true value by only 1 per cent. A com- 


TABLE 4 
AVERAGES AND VARIANCES FOR SICK LEAVE 








November All 


July August | September} October 1-14th onan 





Average (days) 
True values 0.279 2.644 
Remembered . i é J 2.609 
Discrepancies r —0.035 


Average discrepancy as a 
percentage of the average 
true value 


Variances 
True values o;? 
Remembered o,* 
Discrepancies og* 


Ratio of the variance of the 
remembered values to the 
variance of the true values 


Standard error of 
discrepancies 


Correlation between dis- 
crepancies and true values 
pdt —0.082 —0.091 


























Discrepancy = Remembered Value —True Value. 


parison between the remembered and true values for each month and 
for all the months together is given in Table 4. 

When the average discrepancy is expressed as a percentage of the 
average true value, this varies from +22 per cent for the fortnight of 
November to —11 per cent for October. For all but one month the 
variance of the remembered values exceeds the variance of the true 
values. In this month there is, of course, an appreciable negative cor- 
relation between the discrepancies and the true values. In no case does 
the average discrepancy exceed twice its standard error. Thus we can- 
not say that the average discrepancies differ significantly from zero, 
i.e., that there is a residual bias. Clearly an experiment on a much 
larger scale was required to determine this. Something can, however 
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TABLE 5 
TYPES OF ERROR MADE IN REMEMBERING SICK LEAVE 








g Novem- 
July | August — October} ber 
aad 1-14th 





Correct 
No leave taken 
Some taken 


Transference 
In from later month 
In from earlier month 
May be in from earlier 
month, may be invention 


Transference 
Out to later month 
Out to earlier month 
May be out to earlier 
month, may be forgotten 


Wrong amount 
Overestimate 
Underestimate 














All Persons 433 433 











be learned of the types of error that were made. This analysis is pre- 
sented in Table 5. 

Each month has been considered separately. Thus for September 
’ there were 369 correct entries, 341 being cases where no leave was 
taken. There were 13 cases where leave, which had been taken in either 
October or November, was entered for September. Then there were 11 
cases where leave was recorded for September which was actually 
taken in August, July, or June. There were 2 cases where the leave 
was either purely imaginary or had actually been taken before June 
(we had no information about earlier leave). 
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On the other hand, there were 12 cases were leave actually taken in 
September had been recorded for either October or November. Simi- 
larly, there are 4 cases where it had been recorded for either August or 
July. Then there were 13 cases where the leave may have been com- 
pletely forgotten, or where it may have been thought to have occurred 
prior to July. (We do not know what the informant thought about 
earlier months). 

Finally, there were 4 cases where the informant remembered some 
leave for September but overestimated it, while in 5 cases the leave 
was remembered but underestimated. 

In making this classification a period of leave, which was remem- 
bered wrongly both as to amount and month, has been classified as a 
transference. Also any period wrongly recorded as “none” appears as 
a transference not an underestimate. It will be clear that in making 
such a classification a certain amount of judgment has been used and 
the results must only be taken as indicating the order of things. In 
comparing months it must be remembered that we had not available 
the actual leave taken prior to June, and we do not know what the 
informant thought he had taken prior to July. 

In spite of these limitations, the classification is of value in that it 
draws attention to the importance of transference and the compen- 
sating nature of many of the errors. Forgetting mainly took the form of 
forgetting when rather than forgetting completely. With so much trans- 
ference occurring, there is every reason to expect that estimates for a 
particular period of time will be affected by experience prior to this 
period, as would appear to be the case with medical consultations in the 
Survey of Sickness. There would appear to be a case for taking the 
memory period up to the day before the interview, if this is possible, 
since this closes the door to one set of transfers. 

Analysis of the data on annual leave gave similar results to those 
for sick leave. The averages and variances are given in Table 6. 

Again, as in the case of sick leave, the variance of the remembered 
values exceeds the variance of the true values, except for the month of 
October. Again, the small sample size makes it impossible to say 
whether the average discrepancies differ significantly from zero. 

An attempt to analyze the types of error, as in Table 5, proved more 
difficult than in the case of sick leave as the number of periods of 
annual leave taken by each person was much greater. Nevertheless it 
was again clear that transference was the major factor. 

This experiment left us with many questions unanswered. What 
would be the effect of not asking about the fortnight in November? 
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TABLE 6 
AVERAGES AND VARIANCES FOR ANNUAL LEAVE 








Septe Novem- 
July | August , ™-! October} ber 
al 1-14th 





Averages (days) 
True values 0.439 
Remembered 0.422 
Discrepancies —0.017 


Average discrepancy as a per- —4% 
centage of the average true 
value 


Variances 
True values o? 
Remembered oa,? 
Discrepancies o* 


Ratio of the variance of the 
remembered values to the 
variance of the true values 


Standard error of discrepan- 
cies 


Correlation between discrep- —0.212 |—0.187 
cies and true values pa: ; ; 




















Discrepancy =Remembered Value —True Value. 


What would be the effect of altering the number of months? Our plans 
for collecting some evidence on these points fell through with the sus- 
pension of the Survey of Sickness and we shall have to wait for another 
opportunity to collect further data. Before leaving the subject we will 
give in the next section some other examples of questions that have 
been used, 


4. SOME MEMORY QUESTIONS 


Experience with memory questions has made us realise the danger 
that errors arising from other causes may be wrongly attributed to 
memory. It is important to remember that, to the informant, any 
recent medical consultation is interesting, not just the one occurring 
“vesterday” or in the “last seven days.” The importance of this, and 
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other factors, is brought out in the following examples: 
Ezample (t) 


Did you consult a doctor yesterday, that is on 
(WRITE IN THE DAY OF THE WEEK) 
IF YES (Y), for what? 





This question was used experimentally at various times during 1951 
as the final question in the Survey of Sickness interviews. It started 
life without the concluding words, “that is on ,” and also 
without the dependent question “for what?” These were added after 
checks had been made by mail and by reinterviewing. Checking the 
answers to memory questions by re-interview or by mail cannot of 
course be exact, due to the interval of time that must elapse before the 
check is made, but these checks did reveal some errors. We found that 
the interviewer was sometimes dating her questionnaires wrongly. We 
also found that a few informants were saying “yes” when a consulta- 
tion had occurred on the day of the interview. The addition of “that is 
on ” was an attempt to deal with this. There were also 
a number of cases where the code Y (Yes) had been rung where the 
informant later insisted there had been no consultation on that day, 
or any day near to it (they confirmed however that interviews had 
been made on the correct days). However we found no code X (No) 
which should have been code Y. This could occur if they were record- 
ing errors, and if it was equally likely for a code Y to be wrongly rung 
for a code X, as an X for a Y. For suppose such a recording error oc- 
curred once per five hundred questionnaires, then with a sample of 
4,000 we might expect to find no X wrongly rung but eight Y’s wrongly 
rung (the number of persons who would have consulted a doctor the 
day before the interview would be between 40 and 80). This would ap- 
pear to be the reason for these errors. “IF YES (Y), for what?”, even 
if the answers are never themselves used, would appear to be the answer 
to this problem. Indeed this use of dependent open questions, where the 
actual answers may not even be used, seems to be of general applica- 
bility in reducing the recording errors on precoded questions. 








Example (ti) 
Have you consulted a doctor during the last seven days, that is from last 





(WRITE IN AND NAME DAY) 


Yes Y 
to yesterday, inclusive? No x 


IF YES (Y) which days? 
(NAME DAYS AND CHECK THAT IN RIGHT PERIOD) 
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This question was also used experimentally during 1951 in the Survey 
of Sickness interview. It also started life without the concluding words, 
“that is. .” Checks revealed that the last seven days were 
not sufficiently well-defined. There were cases of informants including 
the day of interview as well as the seven days intended, and cases where 
this day was counted as one of the seven days. To cope with this the 
concluding phrase was added, and the interviewers were asked to check 
that the days named in answer to the dependent question were in the 
right period. Our checks suggested that with this memory period there 
was then little or no memory error. 





Ezample (iit) 





February 
FIRST, I WOULD LIKE TO ASK YOU ABOUT THE 


LAST SEVEN DAYS, THAT [S FROM LAST ____ — 
TO YESTERDAY, INCLUSIVE ned 


Tuesday 
Wednesday 
Thursday 
Friday 

Saturday 23 





WRITE IN AND NAME DAY 
IT WILL BE THE SAME DAY OF THE WEEK 
AS THE DAY OF INTERVIEW. 























1, Have you consulted a doctor for any reason at all during the last seven days, that is from last 
to yesterday, inclusive? (INCLUDE ANY CONSULTATIONS ALREADY MENTIONED.) 

imu — IF NO (X) When did you last consult a doctor for any 

No. x reason at all? 











IF YES (Y) How many times? 
ASK FOR EACH CONSULTATION IN TURN: GO ON TO QUESTION 2 








VISIT B VISIT C 

(a) Which days? 

(b) Did you see your usual doctor 
on this visit? Yes. Y | Yes Y | Yes. bj 
IF NO Who did you see? es Who seen______ on 

Oo oO oO 

(c) Was this under the National | National Health__X | National Health__X | National Health_X 
Health Scheme? Private._______O | Private. O | Private. 0 

Other questions followed 
































These questions were used in March 1952 in an inquiry [4 carried 
out for the Committee on General Practice of the Central Health Serv- 
ices Council. They were added to the end of the Survey of Sickness 
questionne‘re. It will be seen that the memory period chosen was the 
last seven days prior to interview. This period was chosen partly as 4 
result of our earlier experiments but mainly because, on this inquiry, as 
distinct from the Survey of Sickness itself, far greater detail was re- 
quired about what occurred during a sample of consultations. Other de- 
pendent questions followed the three printed above, asking for such 
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details as the prescriptions issued and the certificates given. It was 
mainly to prevent confusion between what had occurred at different 
consultations that these questions were confined to consultations made 
in the last seven days. | 

It will be seen that the informant was told the memory period twice, 
once in the introduction and once in the main question itself. The inter- 
viewer had to write in the day of the week in each case and, as is our 
general practice with memory questions, was able to refer to a small 
calendar printed alongside the question. Both Yes and No answers were 
followed by dependent questions, the Yeses being asked how many 
times and on which days. Where only one consultation was made, the 
day of the week was entered in column A and the other dependent 
questions followed. Where more than one consultation had occurred 
the days of the week were entered at the heads of the columns and the 
interviewer asked all the questions, first about, say, “the consultation 
on Tuesday” and then repeated the questions for “the consultation on 
Thursday”, and so on. The arrangement of the questions made it easy 
for the interviewer to check that the same medicine or certificate was 
not wrongly attributed to two consultations. 

No attempt was made on this occasion to deal with the problem, al- 
ready mentioned in the first section of this paper, which arises because 
people are not necessarily found at home when the interviewer calls 
the first time. However, these questions were to have been asked with 
the Survey of Sickness in the two following months of April and May 
and it was hoped to study this effect in the last month. Our plans in- 
volved splitting the sample into two, confining the interviewing for 
each part to one week and recording the date of each call at every ad- 
dress. This meant in practice that the last seven days would always 
include the day of the first call. In this way we hoped to learn something 
of the effect of any relationship between seeing the doctor and being at 
home when the interviewer called. Unfortunately, the suspension of 
the Survey of Sickness also resulted in the cancellation of the remain- 
ing portions of this inquiry. 


5. CONCLUSION 


The examples given in this article are confined to one type of survey 
dealing with sickness. Different results may well be obtained with differ- 
ent subjects. Furthermore, only single interview surveys have been 
considered. Longer memory periods would be possible, no doubt, if an 
interview were made at the beginning, as well as the end of the memory 
period, or if some form of record-keeping was adopted. Nevertheless it 
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does illustrate the factors to be kept in mind in dealing with memory 
questions. 

In considering the length of memory period to ke chosen it is neces- 
sary to remember that there is likely, with many variables, to be a 
positive correlation over consecutive periods of time, with the result 
that the gain obtained by increasing the length of memory period may 
be less than might be expected. The loss due to memory takes two 
forms. The variance of the remembered values is likely to exceed the 
variance of the true values and there is likely to be a residual bias. 
The predominant type of error in the experiment reported was one of 
transference, where the informant transfers the events both in and 
out at both ends of the selected period. This suggest that a period end- 
ing on the day before the interview is to be preferred, wherever possible, 
since this closes the door to one set of transfers. An interview made at 
the beginning of the period would no doubt help to prevent transfers 
at that end of the period. Perhaps a letter might serve the same pur- 


pose. 

The choice of such a memory period means that it can be fixed in 
length, but not in time, since not all the interviews, or even a sub- 
sample of them, can be made on a given day. Apart from this obvious 


disadvantage that the periods do not all commence and end on the 
same day, there is another disadvantage which is perhaps not always 
appreciated. It arises because not all the informants are found at home 
at the first call, and it is possible for some relationship to exist between 
the variable being studied and the likelihood of the informant being 
found at home on a given day. Little information is yet available on 
this point. 

Our experience with a number of questions has made us realize the 
importance of carefully defining, and redefining, the memory period 
for the informant. It has to be remembered that it is often the event, 
and not when it happened, which appears important to the informant. 

The suspension of the Survey of Sickness brought to an end this par- 
ticular line of inquiry and it seems an appropriate time to describe 
what has been learned. I am well aware that this article raises more 
questions than it answers but it has been written in the hope that it 
will stimulate others to publish their data on this important subject. 
Little has been published. Of the very sparse literature on the subject, 
perhaps the most interesting is an article by Mauldin and Marks [5], 
though they, too, give little quantitative data. 

Finally, I should like to acknowledge the help that I have received 
from my colleagues both present and past. 
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MEASURING THE ERROR OF EDITING THE QUESTION- 
NAIRES IN A CENSUS 


Svein NoRDBOTTEN 
Central Bureau of Statistics, Oslo 


This paper describes an attempt to measure the relative 
errors in totals and their influence on size distributions due to 
the questionnaire editing in a census. The experiment de- 
scribed was performed in connection with the 1953 Industrial 
Census in Norway. The results indicate that the errors are 
relatively small and their influence on distributions insignifi- 
cant. The gains of a more thorough editing would probably 
be small and efforts might perhaps be better spent by im- 
proving the census procedure by other means. 


INTRODUCTION 


HE results of a census are often published without any statement 
fpr the quality of the statistics. Both the professional statistician 
and the consumers of statistics would obviously profit by a greater 
knowledge of the accuracy of the statistics. 

In a large census there may arise a great number of errors which may 
have an important influence on the results. Deming [2] gives an exten- 
sive list of possible errors. The professional statistician will be inter- 
ested not only in a quantitative measure of the different error com- 
ponents, but also in their causes in order to be able to make more 
efficient census designs in the future. 

The error components may be classified in two main groups: 

(a) Random errors which arise because some element of random 

selection has been introduced in the census procedure. 

(b) Non-random errors which are due to factors such as bad design, 
collection, editing, processing, etc. This group may be subdivided 
into subclasses. 

Different approaches have been tried in evaluating total or relative 
non-random errors and their components in surveys and censuses. One 
which seems to be successful [3], is to check a small probability sample 
and compare census and sample results item-by-item. 

In connection with the 1953 Industrial Census in Norway an experi- 
ment was done along these lines in order to try methods for evaluating 
quantitative measures for the editing errors and testing their influence 
on distributions of establishments. 
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DESCRIPTION OF THE EXPERIMENT 


The purpose of the experiment presented in this paper was to investi- 
gate the relative editing error of statistics describing a mass of small 
manufacturing establishments, and the extent to which these errors 
influence certain distributions of the establishments. 

The experiment was limited to Industrial Census questionnaires 
from six to seven thousand establishments, edited by six persons. Each 
day a five per cent systematic sample with random starts was drawn 
from the questionnaires edited that day by each person. The sample 
was then submitted to a second and very thorough control editing, 
independent of the original edit, with respect to employment, wage 
expenditures and value of production. The control editing was done 
by one specially instructed person. 

The six persons performing the first editing were instructed not to 
contact the respondents for any supplementary information. When no 
answer was given or when an answer was supposed to be wrong, they 
were to make a rough estimate of the characteristic. In this way, 
6620 questionnaires were edited. On the other hand, the one person 
performing the control editing was instructed to use all means to get 
corréct information about employment, wage expenditures, and the 
value of production. For 103 of the 331 sample questionnaires, one or 
more requests for further information were sent. 

These definitions were used: 

(a) The individual error of a characteristic is the difference of the 
result obtained by the first and the second editing of a question- 
naire. 

(b) The relative error of a characteristic is the sum of all individual 
errors divided by the sum of the result obtained if all question- 
naires were submitted to the control editing. 

The following notation was used: 
n=total number of establishments in the sample. 

N =total number of establishments in the population. 
jf =sampling fraction. 
L=number of persons performing the editing. 
n,=number of establishments in the sample from the Ath person. 
2x; = individual error on questionnaire ¢ in the sample from the 
hth person. 
yxi= value of the characteristic for questionnaire 7 in the sample 
rom the Ath person after control editing. 
r, = estimate of the relative error of the population edited by the 
Ath person. 
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r<=estimate of the relative editing error. 
s?=estimate of the variance of r. 
a;(k) =1 if the ith establishment in the sample is classified in class k 
in the first editing, 
=0 otherwise. 
b;(k) =1 if the ith establishment in the sample is classified in class k 
in the control editing, 
=0 otherwise. 
c;(k) =1 if the ith establishment in the sample is classified in class k 
in both editings, 
=(0 otherwise. 
M =number of classes. 
The first aim was to estimate the relative errors of the characteristics. 
These errors were estimated by means of the ratio estimator 


(1) r= Dom / LE ww 


This estimate is biased. On the other hand, it is consistent and the 
bias in this case is expected to be insignificant. When the number of 
questionnaires included in the population is as great as here and the 
sampling fraction is only five per cent, the estimate should have an 
approximate normal distribution if the sample is regarded as a strati- 
fied random sample. The variance of r was estimated by [1] 
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To test the hypothesis that the editing errors had no influence on 


the distributions of the establishments by employment, wage expendi- 
tures, and value of production, the following variable was formed 
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(5) h(k) = >> bi(k), ere 
Nn 


var [hi(k) — he(k) ] 


- . ~(= =): Ee +k) 223 c(h). 
n\N -1 ns 
Assuming that the differences hi(k)—ha(k), k=1---+-M, are nor- 
mally distributed, the variable K? will be approximately chi-square 
distributed with M—1 degrees of freedom [4]. 
We chose a one per cent level of significance and defined a value 
xo’ such that the probability 


(7) P(xo? = K?; M — 1) = 0.99. 


The hypothesis that the editing error does not influence the distri- 
bution of the establishments should be rejected when K?> x°. 





THE RESULTS OF THE EXPERIMENT 
The estimates of the relative editing errors and their standard devi- 
ations computed from formulas (1) and (2) are given in Table 1. 
TABLE 1 


ESTIMATES OF THE RELATIVE EDITING ERRORS AND 
THEIR STANDARD DEVIATIONS 











Pr Relative editing Standard 
error deviation 
Employment 0.0273 0.0126 
Wage expenditures 0.0248 0.0128 
Value of production 0.0172 0.0114 





The table indicates that the relative editing errors are rather small 
in spite of the imperfect editing. All of them seem to be positive. Even 
though there is no significant difference, it is also interesting to note 
that the relative error is lowest for the value of production. With a 
confidence coefficient of 0.99, the maximum relative error of employ- 
ment is 6.5%, of wage expenditures 6.3%, and of the value of pro- 
duction 5.1%. 

The relative distributions of the sampled establishments by employ- 
ment, wage expenditures and value of production are given in Table 2. 
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TABLE 2 
RELATIVE DISTRIBUTIONS OF SAMPLED ESTABLISHMENTS 








Distribution after first editing: Distribution after second editing: 
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The hypothesis that the distributions of the establishments by em- 
ployment, wage expenditures, and value of production after the first 
editing are equal to the distributions by the same characteristics 
which would have been obtained if the control editing had been com- 
plete, was tested with the following results. 


TABLE 3 








Distribution by K? Significant 
deviation 





Employment 19.529 No 
Wage expenditures 4.363 No 
Value of production 13.143 No 





CONCLUSION 


The results of the experiment, described in this paper indicate that 
the effects of errors in editing questionnaires on statistics for a popula- 
tion of small establishments were small. The tests did not reject our 
hypothesis that there are no significant deviations between the distri- 
butions obtained by the first editing procedure and those which would 
have been obtained if all questionnaires were submitted to a control 
editing. The accuracy gained by a thorough and expensive editing of 
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the questionnaires is small and may perhaps be obtained cheaper by 
improving the census procedure by other means. 
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THE RELATIONSHIP OF HOUSING PRICES AND 
BUILDING COSTS IN LOS ANGELES, 1900-1953* 


Rosert M. WILiiaMs 
University of California (Los Angeles) 


Series measuring residential construction costs and asking 
prices for single family dwellings in Los Angeles both had 
steeply rising trends of approximately equal slope from 1900 
to 1953. Both series described clearly defined cyclical fluctua- 
tions of wide amplitude. However, asking prices, which are 
indicative of sales values, fluctuated more widely than con- 
struction costs. The ratio of asking prices to construction 
costs traced a cyclical pattern of large amplitude with troughs 
occurring in 1900, 1919, 1934, 1942, and 1950. Except for the 
immediate postwar period, 1945-1947, this ratio fluctuated 
less widely since 1934 than in the earlier part of the period 
analyzed. 


N A recent article in this Journal Blank compared secular movements 
| in average house prices in 22 cities with a construction cost index.' 
His purpose was to test the validity of the common practice of substitut- 
ing construction cost indexes—which are generally available—for 
market price series—which are generally nonexistent—for property 
assessment and other purposes. He concluded that, for the series exam- 
ined, “the construction cost index measures with quite reasonable 
accuracy the secular movement of house prices.”? He showed, however, 
that short-run divergences between the two series at one point exceeded 
30 per cent, although the difference was less than 10 per cent during 
most of the period from 1890 to 1934. Hence, for short-term analy- 
sis, “some margins of error are involved in using the cost index as an 
approximation of a price index.”* 

The present article will examine fluctuations in housing prices and 
building costs in Los Angeles from 1900 to 1953. This analysis supports 
Blank’s conclusions concerning both the similarity in underlying trends 
in housing prices and building costs and the significant divergence 
between the two series from year to year. It will be shown, however, 
that the Los Angeles housing price index had much greater amplitude 
and traced a considerably more regular cyclical pattern than Blank’s 





* Much of the material contained in this article is derived from a study of Residential Consiruction 
in Los Angeles, 1900-58, under the direction of the Bureau of Business and Economic Research, Uni- 
versity of California, Los Angeles. 

1 David M. Blank, “Relationship Between an Index of House Prices and Building Costs,” Vol. 49 
(1954), 67-78. 

2 Ibid., p. 78. 

§ Ibid., p. 78. 
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92-city price series. Finally, reasons will be suggested for the differences 
in cyclical amplitude and regularity between the Los Angeles housing 
price series and that for the 22 cities combined. Before examining the 
relationship between housing prices and construction costs in more 
detail, the housing price series developed for Los Angeles will be de- 
scribed. 


AN INDEX OF ASKING PRICES FOR SINGLE FAMILY DWELLINGS IN 
LOS ANGELES 


Mr. Blank’s 22-city index of housing prices was based on the owner’s 
estimate of the value of his property in 1934 and his recollection of the 
price paid when the property was acquired.‘ Correcting the data for 
both normal depreciation and appreciation from improvements, he 
constructed an index of housing values from 1890 to 1934. 

The Los Angeles series is based on asking prices for single family 
dwellings. Asking-price data were selected quarterly from classified 
advertisements for 5- and 6-room houses built on standard sized lots. 
It is recognized that asking prices differ from actual selling prices by 
an average margin which may vary cyclically.’ Moreover, houses ad- 
vertised for sale may not be representative of the entire population of 
single family dwellings, a population which itself is constantly changing 
in composition. 

This asking-price index was compared with a market-price series 
available from 1940 to 1953. The patterns of behavior in the two series 
were remarkably similar during this 13-year period in which average 
prices increased approximately 200 per cent and building costs nearly 
as much.* The two independently constructed price series not only had 
similar secular trends, but also had nearly identical year-to-year 
fluctuations. This parallel movement since 1940, plus the fact that the 
asking-price series displays the expected close correlation with resi- 
dential construction for the period from 1900 te 1940, suggests that a 
carefully constructed asking-price index can be substituted for market 
prices. 

The close relationship between the asking-price index and new dwell- 
ing units authorized in Los Angeles County is shown in Table 1. Both 
series increased rapidly after 1900, turned down briefly under the im- 





4 Data for Blank’s analysis were derived from the Financial Survey of Urban Housing (Washington, 
D. C.: U. 8. Department of Commerce, 1937). 

5 A preliminary analysis of data supplied by multiple-listing services in the Los Angeles and San 
Francisco areas indicates that average sales price has varied in recent years between 90 and 96 per cent 
of the list or asking price. As might be expected, there is some evidence that the spread between sales 
and asking price is greater when values are declining than when they are increasing. 

€ See Williams, Robert M., “An index of asking prices for single family dwellings,” The Appraisal 
Journal, 22 (1954), 33-8, for a description of the asking-price and market-price indexes for Los Angeles. 
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TABLE 1 


ASKING-PRICE INDEX AND DWELLING UNITS AUTHORIZED 
IN LOS ANGELES COUNTY, 1900-1953 








Asking- Dwelling Units Asking- Dwelling Units 
Year Price Index Authorized Year Price Index Authorized 
(1940 =100) (in thousands) (1940 =100) (in thousands) 





1900 36 
1901 43 
1902 54 
1903 61 
1904 63 
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1932 89 
1933 76 
1934 75 
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pact of the 1907 recession in general business activity, again increased 
to 1912 or 1913, and then declined to a lower turning point in 1917 or 
1918. Both series increased rapidly in the early twenties, reached an 
upper turning point in 1923 or 1924, and then declined until 1934. After 
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1934, both series again increased. The asking-price index, however, 
declined in 1939 and 1940, while dwelling units authorized increased 
each year until 1942, when wartime building restrictions were imposed. 

Space does not permit more detailed analysis of the relationship 
between these two series. This brief description of parallel movement 
is presented to suggest that the asking-price index behaved as one 
would expect a market-price series to behave, because the rate of new 
residential construction would be expected to have a close relationship 
to the market prices of existing dwellings. Hence, because asking 
prices and rate of residential building are closely related, it appears that 
asking prices can be substituted for market prices in analyzing fluctua- 
tions in residential real estate series. 


THE RATIO OF ASKING PRICES TO BUILDING COSTS 


The index of asking prices for single family residences in Los Angeles 
from 1900 to 1953 is presented in column 1 of Table 2 and an index of 
residential construction costs in column 2. The cost series was based 
on the Boeckh index of building costs for frame houses in Los Angeles 
from 1913 to 1953. Because no local cost series is available for earlier 
years, Blank’s national index of construction costs was used from 1900 
to 1913. This substitution is probably satisfactory, since the behavior 
of the national cost index in this period closely paralleled the average 
permit value per dwelling unit authorized in Los Angeles, a series which 
fluctuates with construction costs except in periods of rapid cost or in- 
come change. It should be pointed out that actual building costs have 
greater cyclical fluctuation than building-cost indexes. This is because 
the latter largely exclude contractors’ profits, which vary much more 
than wage and material costs. Hence, using a building cost index exag- 
gerates somewhat the actual cyclical divergence of market values and 
replacement costs. 

As shown in Chart 1, the secular trends of the Los Angeles price 
and cost series are quite similar—as Blank found to be the case for the 
national series. The cost index increased nearly 500 per cent from 1900 
to 1953. The price index increased somewhat more in this period, but 
the difference is not significant. Rather, it reflects the fact that market 
prices were depressed in 1900, while in 1953 residential values were at 
record levels. 

Although their trends are alike, the two Los Angeles series show 
considerable divergence in the short run. Whereas Blank’s price series 
fluctuated above and below the cost series in a more or less erratic 
fashion, the Los Angeles price index fluctuates widely about the cost 
index in a clearly defined pattern. 





TABLE 2 


ASKING-PRICE AND CONSTRUCTION-COST INDEXES IN 
LOS ANGELES, 1900-1953 (1940 = 100) 








Con- Ratio of ? Con- Ratio of 
struction- Price to Asking- struction- Price to 


Year Price 


Cost Cost Cost Cost 
Index Index Index Index Index 


(1) (2) (3) (1) (2) (3) 
1900 36 44 82 122 90 136 
1901 43 43 100 111 84 132 
1902 54 45 120 89 76 117 
1903 61 46 133 76 79 96 
1904 63 46 137 75 83 90 


Asking- 
Price 
Index 








1905 66 48 138 86 83 
1906 70 53 132 92 88 
1907 74 55 134 107 97 
1908 70 110 102 
1909 70 104 97 


1910 69 100 
1911 72 107 
1912 72 113 
1913 73 130 
1914 69 156 


1915 188 
1916 ‘ 278 
1917 281 
1918 277 
1919 240 


1920 

1921 134 124 282 
1922 143 140 

1923 180 162 

1924 190 178 


1925 182 184 
1926 163 166 
1927 152 158 
1928 142 149 
1929 130 ¢ 134 





Source: Column 1—Index derived from quarterly average of median asking prices for 5- and 6-room 
houses in classified advertisements. 

Column 2—1900-1912: based on Blank’s national index of residential construction costs. See David 
M. Blank, “Relationship between an Index of House Prices and Building Costs,” Journal of the American 
Statistical Association, Vol. 49 (1954), p. 7%, Table 4 for sources. 

1913-1953: Boeckh building-cost index jor frame residences as reported in Housing Statistics Hand- 
book and Engineering News-Record. 
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Between 1900 and 1934, two complete cycles of wide amplitude are 
discernible in the ratio of prices to costs (see Chart 2). These cycles, 
which correspond almost exactly to those in new residential construc- 
tion, had periods of 19 and 15 years, respectively, from trough to 
trough. Their amplitudes, measured as the difference between the 
minimum and maximum values of the ratio in each cycle, were 64 and 
110 percentage points, respectively. After 1934, the ratio fluctuated 
less widely, except for the immediate postwar years, increasing from 
1934 to 1937 and then declining until 1942. New residential construc- 
tion, however, continued to increase each year from 1934 to 1941 (see 
Table 1). 
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RATIO OF ASKING PRICES TO CONSTRUCTION COSTS 
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Several factors may have contributed to differences in behavior 
between Blank’s 22-city price series and that for Los Angeles. One is 
Blank’s method of correcting his price series for depreciation and 
capital improvement by making a uniform adjustment over the cycle. 
This may have produced a more stable price series than would other- 
wise have resulted. Moreover, in combining data for 22 cities differences 
in cyclical turning points would blur the cyclical pattern existing in 
series for individual cities. Finally, Los Angeles has grown very rapidly 
—more than doubling in population in each of the two decades begin- 
ning in 1900 and 1920. These upsurges in population coincided with 
sharp increases in rents, housing values, and rate of new construction. 
The available data for other cities, however, including B.L.S. series on 
rents and new dwelling units authorized, indicate that wide fluctuations 
in real estate series have occurred in more slowly growing areas also. 


CONCLUSIONS 


Wide fluctuations occurred in the ratio of housing prices to construc- 
tion costs in Los Angeles from 1900 to 1934 and also after World War 
II. Hence, a cost index could not have been used satisfactorily to 
measure changes in housing market values in these periods. 








A TEST OF A LINEAR FUNCTION OF THE DEVIATIONS 
BETWEEN OBSERVED AND EXPECTED NUMBERS* 


Wiuuiam G. CocHRAN 
The Johns Hopkins University 


I. INTRODUCTION 


s 1s well known, the x? test of goodness of fit is not directed against 
A any specific pattern of the deviations (f;—m,) of the observed 
frequencies f; from the expected frequencies m;. For this reason, the 
x’ test is sometimes insensitive in detecting a failure of the null hy- 
pothesis. There are, however, a number of alternative or supplementary 
tests that may be used when it is possible, from the nature of the 
problem, to predict the type of alternative hypothesis that is most 
likely to hold if the null hypothesis fails. These tests include a com- 
parison of the variances, or of the third and fourth moments, of the 
observed and theoretical distributions, and various ways of breaking 
down x? into components [1]. 

One additional test of this kind is obtained by selecting any linear 
function of the deviations, 


L = do gilfi — mi, 


where the g; are numbers, chosen in advance by the person making 
the test, in such a way that L will be sensitive to the alternative hy- 
pothesis that is thought most likely to hold. By suitable assignment of 
the numbers g;, the criterion ZL can be made responsive to any antici- 
pated pattern of deviations, either in their signs or in their magnitudes. 
In particular, if all but one g; are put equal to zero we obtain the test 
of an individual deviation. This paper describes how to make an 
approximate test of significance of the value of L. The test is approxi- 
mate in roughly the same sense in which the goodness of fit x? test is 
itself approximate, i.e., the test is strictly valid as an asymptotic result 
when the expectations become large. The theory of the test will be 
presented first, followed by several illustrative examples. 


II. EXPECTATIONS GIVEN IN ADVANCE 


For simplicity, we consider first the case in which the expectations 
are completely specified by tbe null hypothesis, so that there are no 





* Work assisted by a contract with the Office of Naval Research, Navy Department. Department 
of Biostatistics paper No. 301. 
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unknown parameters to be estimated. This is not the common situa. 
tion, but it does occur, for instance, in testing Mendelian inheritance 
where the binomial p is given by theory. 

Let the known expectations be denoted by the symbols M; and let 


N=DM=Dh 


be the total size of sample. On the null hypothesis, the observed fre- 
quencies f; follow a multinomial distribution with the following proper- 
ties: 


Mean: 
E(fi) = M; 


Variance: 


Vif) = M; {i _ — 
N 
Covariances: 
MM; 
Cov (fi, fi) = — y (i # 9). 
Then, taking 
L' = ¥ gi(f; — M3), 
we have, on the null hypothesis, 
E(L’) = 0 
V(L’) = pe g°V (fi) +2 20 gigs Cov (fi, fi) 


i<j 


M; 2 
= > 97M; {i —_ N \ —_ WV _ gig;M iM; 


i<j 


1 
= )'9°M: — 7 (> g:M,)?. (4) 


This is an old result, and is exact for any size of sample [3, §55]. 

Further, as N becomes large, with fixed p’s, so that the M; become 
large, the multinomial distribution of the f; tends to a multivariate nor- 
mal distribution [2, §30.1], and L’ tends to be normally distributed. 
Hence, the test of significance is made either by treating L’/o(L’) as 
a normal deviate, or by treating 
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as x? With 1 degree of freedom. 


III. ONE-PARAMETER ESTIMATION 


When the expectations m; are estimated from the sample, the situa- 
tion is more complex, and the formula to be given for V(L) is valid 
only when the expectations are large. Suppose that the M; are known 
functions of a single unknown parameter @, of which the sample 
estimate is 6. Maximum likelihood estimation, or some asymptotically 
identical method, is assumed. The symbol LZ will denote the linear 
function when the expectations are estimated, while L’ will be used, 
as in Section II, when the expectations are known. 

We now have 


L = Di gilfi — mi) 


where the m, are the values taken by the M; when 6=6. We will first 
find the variances and covariances of the (f;—m;), noting that the m; 
are now functions of the sample observations. 

To save space, two standard results in the theory of maximum likeli- 
hood estimation will be quoted. These results, and most results that 
follow from them, are valid apart from terms that can be neglected 
when the expectations are large. The symbol = denotes an equation 
of this type. 

The first assumed result is 


ig tem 


M;) he 
M; 00 


06 


— 1 (=) (6) 


is Fisher’s “amount of information,” and is the inverse of the asymp- 
totic variance of 6. (In expositions of maximum likelihood theory, the 
result (5) usually appears in a slightly different form. For instance, 
in Cramér [2], equation (53.3.4), the right side of (5) has an additional 
denominator which, as Cramér shows, converges in probability to 1 
when the expectations become large.) 

The second assumed result is 
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m; — M; = (6 — @) * (7) 
this being obtained by the first term of a Taylor expansion. Equations 
(5) and (7) require certain restrictions on the forms of the functions 
M; and their derivatives (cf. Cramér, loc. cit.), but I believe that these 
are satisfied in any of the common applications. 

Substitution of the value of (@—@) from (5) into (7) gives 


1 8M; . (f; -M) 0M; 


m — M;=— 2. 


I 0 4% M; a0 








fi —-m = (fi aad M;) - (m; - M;) 
1 0M; — M;) 0M; 


oe ae (fi 
“ae 9 ” = M; a0 








(8) 


This is the key equation. It expresses (f;—m,) as a linear function 
of the deviations (f;—M/;) from the true expectations. Since the vari- 
ances and covariances of these latter deviations are known from equa- 
tions (2) and (3), we can now find the variance of (f;—m,), or of any 
linear function LZ of these deviations. Equation (8) also implies that, 
to the present order of approximation, Z(L) =0 when the null hypothe- 
sis holds. 

Instead of proceeding directly, we shall follow a different route that 
appears to simplify the algebra. The right side of equation (8) may be 
interpreted as the deviation of (f;—M,) from its linear regression on 
the variate 


— M;) 0M; 
M; 08 
To see this, we have from (2) and (3) 
OM; M; OM; OM; 
Cov i —- M;),X 2 a as see = ’ 10 
ig ) 00 N x 00 06 a0) 
since >. M;=N, so that its derivative vanishes. Similarly, we find 
1 {/oM;s\? 
V(X) = —({—-} =I. 11) 
m= £ (=) 
Thus the regression coefficient of (f;—M;) on X is 


x-> & 





(9) 
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Hence, equation (8) may be rewritten as 
(fi — mi) = (fi — Mi) — bX. 


L' = digs: — Mi), 
it follows that 
L = Di gfi — mi) = L’ — bX, 
where b= )_gib; is the regression coefficient of L’ on X. 


Hence the variance of L is equal to the variance of the deviations of 
L’ from its linear regression on X, i.e., 


[Cov (L’, X) 


V(L) = V(L’) - V(X) 





(12) 
But, from (10) and (11), 
aM; 
Cov (L’, X) = > gi — V(X) =I. 


This gives, finally, 


(Li gM)? 1 (x " ae 


V(L) = ( DgtMy (13) 


I 06 


In practice, we substitute the computed expectations m; in place of 
M,. For testing a single deviation, we have 


Om; 2 


ry (14) 


where V denotes an estimated variance. 

As in the case where the expectations are given, the test is made 
either by treating L/s(L) as a normal deviate, or by treating L?/V(L) 
as x? with 1 degree of freedom. 

Like the goodness of fit test, this test requires some restriction on 
the smallness of the expectations. The restriction needed will depend 
on the form of the function L. It might be safe to allow some expecta- 
tions as low as 1 or 2 if these expectations receive relatively small 
weights g; in computing L. An example of the exact small-sample dis- 
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tribution is presented in the Appendix. So far as it goes, it suggests 
that the normal approximation may work about as well as does the 
tabular x? approximation in the ordinary goodness of fit test. Pending 
further investigation, it seems well to follow the common rule that the 
minimum expectation should not be less than 5, particularly when a 
single deviation is being tested. With a single deviation, it is advisable 
to apply a correction for continuity, by taking the normal deviate as 


fi ial mi | —4 
s(L) 
There is a kind of intuitive interpretation to the result that the 
variance of L equals the variance of the deviations of L’ from its 


regression on X. In Section II, when no parameters were being esti- 
mated, the deviations ({;— /;) were subject to the single restriction, 


D (fi — Mi) = 0. 


It is this constraint that introduces the negative covariance in equa- 
tion (3) between f; and f;. When a parameter is being estimated, the 
maximum likelihood equation imposes a further constraint, i.e., 





fi OM; 


a a ae eG, 


M; 00 
which may be written 
(f: — M,) 0M; 


X = 
x M; 06 


Thus the equation of estimation may be regarded as fixing the value 
of X. This additional restraint leaves the observed frequencies less 
free to deviate from the theoretical frequencies, and may be expected 
to diminish the variance of ZL as compared with that of L’. It is not 
surprising that the appropriate variance is now the variance of the 
deviations from the regression on the quantity X that is constrained 
by the equation of estimation. 





IV. APPLICATION TO THE BINOMIAL AND POISSON DISTRIBUTIONS 


Perhaps the most common applications of goodness of fit tests are 
those to problems in which the null hypothesis specifies either a bi- 
nomial or a Poisson distribution. Formulas for V(L) appropriate to 
these cases can be obtained by substituting the appropriate expressions 
for the M; and their first derivatives in (13). 
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The binomial and Poisson distributions have the common property 
that the equation of estimation makes the sample mean equal to the 
theoretical mean. By applying the “regression” argument, we can 
obtain a more general formula for V(L), applicable to any discrete 
distribution in which the maximum likelihood estimate is the sample 
mean (or a function of it). 

Let f; denote the frequency of i “successes” ({=0, 1, 2---). We 
may take X as 


X= Life — Mi, 


since the equation of estimation makes this quantity zero. 
Then, it is easy to verify that 


(> 9:M3)( D iM) 


cov (L’, X) = >> ig:M; — y 


= >) 9Mii — ») 
where yu is the mean of the theoretical distribution. Also 
V(X) = No? 


where o? is the variance of the theoretical distribution. 
Hence, for the estimated variance of deviations from the regression, 





_ (>> gim,)? _ [ > gimili — a) FP 
N Né? 





V(L) = Dd gitm; (16) 


where we have substituted sample estimates for the unknown theoreti- 
cal values involved. For a sample of N from the binomial (¢+p)*, we 
substitute 


p= np: 6? = np§. 
For a sample of N from the Poisson, we substitute 
p = 6? = 7 = sample mean. 
Vv. EXAMPLES 


Example 1. Fisher [3] has analyzed Geissler’s data on the distribution 
of number of boys in 53,680 German families of size 8 (see Table 1). 
The null hypothesis is that the number of boys follows the binomial 


(q + p)® 


where p= is the proportion of boys: #, as estimated from the data, is 
0.51468. 
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TABLE 1 
NO. OF BOYS IN FAMILIES OF 8 








No. of families 





Observed Expected Deviations 
fi mi fi—m; 





215 165 .22 + 49.78 
1,485 1,401.69 + 83.31 
5,331 5,202.65 +128 .35 

10 ,649 11,034.65 — 385.65 

14,959 14,627 .60 +331 .40 
11,929 12,409 .87 — 480 .87 
6 ,678 6 ,580 .24 + 97.76 
2,092 1,993.78 + 98.22 

342 264 .30 + 77.70 


SONA r WON KH © 





53 ,680 53 ,680 .00 











Fisher noted, as is apparent in Table 1, an excess of families with 
very unequal numbers of boys and girls. He also noted an apparent bias 
in favor of even numbers of bev’s. This bias shows up in the central values 
(2-6 boys). At the extremes, the effect is obscured by the excess of 
unequally divided families. 

To test whether there is an excess of families with even numbers of 
boys, we may take 


gi = +1 (¢ even) : g; = — 1 (7 odd). 
We find 
L = gif: — mi) = +1369.98. 
To apply formula (16) for V(L), we compute 
> 92m; = 53,680 
, » gim; = 0.02 
> igim: = 0.09 
@ = 4.1174 
6? = 849 = 1.9982 
N = 53,680. 
Now apply formula (16) 





TE 1955 
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. 7 (Liga)? [2 gmi — u) 
VL) + Li gtm - = — - ae : 


It is clear that the two subtraction terms are entirely negligible, so that 
V(L) = > gitm; = 53,680. 


The normal deviate is 





+ 1369.98 
V/53,680 


indicating a significant excess of families with even numbers of boys. 

The reason why the two subtraction terms are negligible is somewhat 
peculiar to this example. In a binomial with p=} and n even, the 
quantities }gim; and >~ig,m; both vanish, as algebraic identities, for 
this set of g;. Here we have a binomial with p nearly }. As an exercise 
in the computations, the reader may try 


gi = +1 (¢ even) : gi = 0 (¢ odd) 


so as to test the sum of the even deviations. The normal deviate will 
again be found to be +5.91, as would be expected, but the subtraction 
terms are not negligible. 

Although this example serves to illustrate the computations needed 
in applying the test to binomial data, there are questions about the 
validity of the application. In working out the frequency distribution 
of L, we have assumed that the coefficients g; are chosen before seeing 
the data, whereas the function L was actually constructed for a type 
of departure from the binomial that was observed in the data. The 
effect is to make the L test give too many apparently significant 
results. This point will be discussed further in section VI. Secondly, 
as already noted, there is an excess of families with very uneven num- 
bers of boys and girls, so that the binomial model used for the null 
hypothesis probably does not apply exactly. This disturbance will 
also influence to some degree the frequency distribution of L, just as 
non-normality in the basic data influences the #-distribution in a Stu- 
dent’s t-test. 

Example 2. This and example 3 are artificial, and are intended to 
illustrate two properties of the test. 

In 100 families of size 2, the frequency with which some attribute 
occurs is recorded. The binomial (q+>7)* is fitted, the estimate # 
being 0.2. 


= +5.91 
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In this example the goodness of fit x? has 1 d.f. Hence if any one of 
the single deviations L =(f;—m,) is tested by use of equation (16), the 
x? value for the single deviation should also equal 6.25. 

For instance, consider the deviation for 1=2. From equation (16), 


m2(2 — ii)? 


Since 47 =np=0.4, this gives 


FU, — m) 2 4 — 28 1992.56) 
a ~*~ 100 (100)(0.4) (0.8) 


2.56. 





. 16 6.25 
Xb * ——— * 0.20. 
2.56 
TABLE 2 
GOODNESS OF FIT TEST OF BINOMIAL DISTRIBUTION 








fi i Contr. to 





60 64 
40 32 
0 4 





N 100 100 6.25 (1 df.) 





The reader may verify that the same value of x,? is obtained from the 
deviations for i=0 and i=1. (Corrections for continuity were omitted 
in this example, since the purpose is to point out an algebraic relation- 
ship.) 

Example 3. A Poisson distribution is fitted to a sample of size 100, 
in a situation in which the goodness of fit x? again has 1 degree of 
freedom (Table 3). If a x? value is computed separately for each of the 
three deviations, we find x,2=0.365 for 1=0; 0.554 for i=1; and 0.728 
for 7=2. Thus the “single deviation” tests give different results from 
one another and from the goodness of fit test, in contrast to the result 
in Example 2. 

The discrepancies, which are puzzling at first sight, are a consequence 
of the grouping of the expectations for 7=2, 3, 4.-- + which occurs in 
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the 2+ class. On account of this grouping, the value of > im,, from 
Table 3, is 25.743, whereas > if; is 26. Thus the sample value of 


X= Dd afi — mi) 


equals 0.257 instead of 0. The three “single deviation” x? can be 
brought into much closer agreement with the goodness of fit x? by 
computing the numerators as 


[fi — mi) — bX}, 


where X =0.257. 

On reflection, however, I doubt whether this adjustment is worth- 
while, because the goodness of fit x? is also based on the assumption 
that > 7f;= > im;. In other words, all four values of x? are approxi- 
mate, and it is not clear that any one of them should be regarded as 
superior or preferable. 


TABLE 3 
GOODNESS OF FIT TEST OF POISSON DISTRIBUTION 








Contr. to 
x? 


fi 





78 0.010 
18 0.209 
4 0.466 





0.685 (1 d.f.) 





Example 4. This example illustrates the test of a single deviation in 
a frequency distribution that is less familiar than the binomial or 
Poisson. The figures in Table 4 show the number of adult syphilis 
patients remaining on the roster of a Baltimore clinic at the beginning 
of successive two-month periods of observation. All the data refer to 
the same initial group of 232 patients, so that the successive observa- 
tions are not independent. 

The data are suggestive of an exponential decay curve. Perhaps the 
simplest mathematical framework that might apply is to suppose that 
in the hypothetical population of which these data are a sample, there 
is a constant probability p that any person on the roster at the begin- 
ning of a two-month period will drop out during the period. The pro- 
portion of the population dropping out in the ith period is then pq‘, 
and the proportion remaining at the end of the 11 periods is g"'. On 
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TABLE 4 





NUMBER OF PATIENTS REMAINING ON CLINIC ROSTER 
AT BEGINNING OF SUCCESSIVE TWO-MONTH PERIODS 








Month following 


Number of patients 





admission on roster 
0 232 
2 171 
4 148 
6 134 
8 121 
10 104 
12 90 
14 78 
16 67 
18 61 
20 56 
22 52 





this argument, the successive numbers dropped in the sample should 
follow a multinomial distribution with expectations as shown in Table 
5. Note that it is necessary to include the number remaining at the end 
of 11 periods (i.e., the number who would drop out in periods 12+) in 
order that the numbers add to the original total of 232. 


PATIENTS DROPPED FROM ROSTER 


TABLE 5 








Number dropped 











Period Observed Expected fe— ms x° 
(fi) (mi) 
1 (0-2) 61 33 .09 +27.91 23.54 
2 (2-4) 23 28 .37 — 5.37 1.02 
3 (4-6) 14 24 .32 —10.32 4.38 
4 (6-8) 13 20.85 — 7.85 2.96 
5 (8-10) 17 17.88 — 0.88 0.04 
6 (10-12) 14 15.33 — 1.33 0.12 
7 (12-14) 12 13.14 — 1.14 0.10 
8 (14-16) 11 11.27 — 0.27 0.01 
9 (16-18) 6 9.66 — 3.66 1.39 
10 (18-20) 5 8.28 — 3.28 1.30 
11 (20-22) 4 7.10 — 3.10 1.35 
12 (22+) 52 42.69 + 9.31 2.03 
232 231.98 38.24 

















1955 


uld 
ble 
ond 


) in 
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There is a simple maximum likelihood estimate of p. Let N be the 
initial number on the roster, N; the number remaining at the end of 
the 7th period, and f; the number dropped during the 7th period, so 
that fi=N—Ni:fi=Nia—Ni, (¢=2, ae 11): fu=Nu. 

The probability of the sample is, apart from factors not involving p, 


P(S) ~ p(pq)*(pq?)’* + a (pqt®)fuigi Vis 
= plrtlate> +f) g Urt2sat->-+1iis), 


But 
fitfe+---+fu = (N — My) + (Ni — Ne) +--+ + (M10 — Nu) 
= N — Ny = D, 
where D is the total number dropped during the 11 months. 
So t+ fst ---+1Nfie = (Ni — N2) +2(N2— Na) +--:- 
+ 10(Nio — Nu) + 11Nn 
=N,+N2.+---+Nu =T — D (say) 
where 
T=N+Ni+---+Mn, 
is the total of the numbers remaining at the beginning of all periods, 


excluding the last period. Writing A for the log of the likelihood, we 
have 


A = log P(S) ~ D log p + (T — D) logg 
dA D T-D 
(T—-D) | (17) 


This gives 


This estimate is a rather natural one. The number dropped in any 
period, divided by the number present at the beginning of the period, 
is an unbiased estimate of p. The estimate # is the total of the numbers 
dropped, divided by the total of the numbers present. 

From Table 4 


D = 232 — 52 = 180: T = 232 + 171+ ---+ 56 = 1262 


180 
p = — = 0.14263. 
1262 
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The expectations and the deviations are given in Table 5. Clearly, 
the first period is aberrant, the number dropped being greatly above 
expectation. 

In order to test the deviation for the first period, we use the general 
formula (14) for its estimated variance, i.e., 


3 1 om, 2 
Vif, — m) = - (=). 18 
(fi: — m) = m N INep (18) 
Since M,=Np, 


aM, 
— =N = 232. 
op 


In a problem of this type, J is usually most easily found by means of 


the relation 
7A 
I=E (- =). 
Op? 


By differentiating equation (17), we have 
fA —-D (T-D) 


ap? p* g’ 





E(D) = N(1 — @"), 
E(T — D) = Nq+@¢+---+q") = Nal — q")/p, 
we find 
I=E 


= _N( - @") 


dp* pq 


Substitution in formula (18) gives the desired variance (where the 
last term is written in a form convenient for computation). 


a m:? (Np)? 
V(fi — m1) ye ee 
N WN(1 — q") 
(33.09)?  (33.09)?(0.85737) 
232 189.31 


= 23.411. 





= (33.09) — 


Finally, applying the correction for continuity to fi—m:=27.91, 


2 _ TAI _ 30 og 
. —— 
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As would be anticipated, this value is highly significant. 

This deviation can be tested, alternatively, by fitting the same model 
to the data from the second period onwards, starting with the 171 
patients left at the end of the first period. The value of the correspond- 
ing goodness of fit x? is 6.18, with 9 df., as against the original goodness 
of fit x? of 38.24, with 10 df. The difference, 32.06, can be regarded as 
a test of the fit during the first period. The discrepancy between the 
values 32.06 and 33.27 (the value given by the L test if no correction 
for continuity is applied) is presumably due to “small sample” effects. 
Both methods lead to the conclusion that the model fits satisfactorily 
after the first period, but that the loss during the first period is too 
large. 


VI. TESTS SELECTED AFTER INSPECTION OF THE DATA 


In the tests described in this paper, the coefficients g; must be 
chosen before inspecting the deviations. With the test for a single 
deviation, it is tempting to apply the test to a deviation for which the 
contribution to x?, i.e., ({;—m.)?/m;, looks suspiciously large. I should 
not wish to discourage examination of aberrant individual deviations, 
but the significance probability given by the L test will then be too 
low. I have not been able to obtain an expression for the significance 
probability which takes account of the selection after inspection of the 
deviations. From intuitive reasoning, it appears that this probability 
will lie between P (that given by the L test) and kP, where k is the 
number of classes in the goodness of fit test. In Example 4, there were 
reasons for suspecting in advance that the first period would show an 
abnormal drop, since some patients, having learned that they had 
syphilis, might shrink from the long course of treatment that was then 
necessary (the data refer to the 1930’s), while others might go else- 
where for treatment. Thus it might have been decided in advance to 
test the deviation for the first period. On the other hand, if the devi- 
ation is picked out purely from inspection, the significance probability 
appears to lie between P and 12P. Since P is infinitesimal in this ex- 
ample, there is no doubt about the statistical significance. 

The above remarks refer to a single deviation. Suppose now that a 
linear function LZ of a number of the deviations is picked out for testing 
because it looks interestingly large. Then a test that takes account of 
this selection and errs on the safe side, i.e., gives in general too few 
significant results, is obtained by referring L?/V(L) to the x? table 
with the number of degrees of freedom used in the goodness of fit test. 

To show this, we shall show that if the coefficients g; in L are selected 
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so as to make L?/V(L) as large as possible, the maximum value of this 
quantity is equal to the computed value of x? in the ordinary goodness 
of fit test. In other words, if we always picked out the linear function 
that would be the “most significant” of all linear functions, we would 
obtain a valid test of this function by referring L?/V(L) to the x? 
table used for the goodness of fit test. Since in practice a linear function 
that is picked out because it is interesting will not generally give the 
largest possible normal deviate, we will be making a conservative 
test of significance if we refer L?/V(L) to the x? table. 

In Example 1 the normal deviate was 5.91. Since x? for these data 
has seven degrees of freedom, a conservative test is to refer (5.91)? or 
34.93 to the x? table with seven degrees of freedom. The result is still 
statistically significant. 

To prove the needed result, write V for V(L). Then 

a (4 2L aL iL? av 
a9: >) V og: V? ag; 
When we set this equal to zero, we obtain the equations 
aL o1L av 


a9, 2 V ag; 


Now from section III, 


= Digilfi—m 


1 av mM; 1 Om; om; 
ae ren Gh Gils mae ae = i 
2 09: . yee ) I 00 en m 


Substitution in (18a) gives, on dividing by mi, 


sam oe - = (Lom) -> = (rer a 


Mm; V I m; 00 


Multiply both sides by ({;—m,) and add over all classes. The left side 
becomes the computed value of x? in the goodness of fit test. On the 
right side, the first term becomes 

L? 


L 
P { Do gifs — mi)} at de 
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as desired to prove the result. The second and third terms on the right 
both vanish, the second because > f;= > mi, the third because of the 
maximum likelihood equation of estimation for 6. This completes the 
proof. The result is valid only asymptotically, since the expression for 
Y is an approximation. When the expectations are given, it is easy to 
show that the result holds absolutely. Conceptually, this test is of the 
same kind as that given by Fisher in §64 of The Design of Experiments, 
and later by Scheffé [5], for testing any linear combination of the treat- 
ment means in an analysis of variance. 


VII. TWO-PARAMETER ESTIMATION 


The “regression” approach extends to the situation where two un- 
known parameters 6, and 6 are estimated by maximum likelihood. 
Since the extension goes smoothly, not all of the details will be pre- 
sented. We first quote the analogues of the two results (5) and (7) 
from maximum likelihood theory. The analogue of (5) is the pair of 
equations 


Tn(@, — 4; T12(02 — 02) = Xi 
(6: — 1) + Ii2@2 — 62) :. (19) 


T12(6,; — 01) + In2(62 — 62) = X 


1 OM; OM; 
Tes al a ae 
X awe) Ge) 
(f: — M.) 0M; 
Xu = ’ 
2X M; 00, 


these being straightforward extensions of the previous notation. 
If cu, is the inverse of the matrix J,,,, the solutions of these equations 
are 





u = 1, 2, 


4 — A ‘i wns + me (20) 
92 — O02 = Ci2X1 + CX. 


The analogue of equation (7) is 
m; — M; = (6; — 6) = + (6, — 62) = , 
06, 02 
Hence 
aM; a OM; 
— (82 — 63) 


(fs — mi) = (fi — Mi) — (1 — 1) 7, . 
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Substituting from (20) and rearranging, we obtain 


aM; aM; 
x 


fi —m: = (fi — Mi) - fe. a, + Cr 30, 


{ OM; 4 OM x 
—_ C Cc . 
21 20, 22 30, 2 


This is the analogue of the key equation (8). It can be shown that the 
coefficients of X, and Xz are the regression coefficients of (f{;—M;) on 
X, and X»2. 

Hence, as before, the variance of L is equal to the residual variance 
of L’ from its multiple regression on X; and X;:. To find this residual 
variance, we have 

aM; 

Cov (L’, X,) = Dm gi = Si, (say), 
00; 
oM; 


Cov (L’, X:) = © gi cae ty (say). 


The two regression coefficients (for L’ on X,, X2) are 
bi = cuSi + ¢12S82, 
bi = CarSi + Co2S2. 
The reduction in variance due to the regression is 
bySi + beS2 = cSi? + 2cr2SpS2 + C22S2?. 
This gives, finally, 


V(L) = >> 92M; - — (e181? + 2ci2SiS2 + ¢22837) 


(20 giM;)? 
N 


as the general formula for two-parameter estimation. 


VIII. APPLICATION TO THE NORMAL DISTRIBUTION 


In order to obtain a formula applicable to the normal distribution, 
we may use the fact that the equations of estimation make the mean 
and variance of the theoretical distribution equal to those of the sample. 
Let d; be the center of the 7th class. It will simplify the algebra if the 
origin is placed at the sample mean. Then we take 


Xi = Didif; — Mi), 
Xe = > d7(f; * Mj). 
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These choices assume that the sample is large and that the grouping 
into classes is not too coarse. Because of the grouping, > d,M; and 
>-d,?M; will not exactly equal the first two moments of the continuous 
normal distribution and it is assumed here that the discrepancies are 
not serious. Moreover, in practice we equate the sample mean square 
(dividing by N—1) to its expectation, instead of the sum of squares, 
so that terms in 1/N are considered negligible. 

It is convenient to have a general formula for the covariance of any 
two linear functions 


H= ) > hifi - M;):K = ) ¥ ki(fi -_ M),). 
From equations (2) and (3) it is found that 


(> hiM.)( Do kM) 


Cov (H, K) = > AkiM; — = 





(24) 


The result remains valid when H and K are identical, in which case the 
covariance becomes a variance. 

Hence we obtain the results needed for setting up the regression 
equations. 


Cov (L’, X1) = >> gidiM; (25) 
since > -d,;M;=0, because the origin is at the sample mean. 
(>> 9:M.)( >> d2M,) 
N 
= > g:M.(d? — s°), (26) 
where s? is the sample variance. 
V(X) = > d2M; = N 8?, (27) 
Cov (Xi, X:) = >) d'M; = 0, (28) 





Cov (L’, X.) = >> gid?2M; — 


since the third moment of the theoretical normal distribution is zero. 
Finally, 


V(X:) = Yo deM; - (2 — = 2Ns* (29) 





since 44=3y2? for the normal distribution. 

The five equations (25-29) enable us to set up the regression equa- 
tions of L’ on X; and X». For the normal distribution, the equations 
separate, since Cov (Xi, X2)=0. Hence, the estimated variance is 
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approximately 
(do gimi)? [Cov (L, X:)]* [Cov (L, X:)}? 
N V(X) V(X) 
«Pinte (2 gimi)? ( Dd gidimi)? _ [do gimi(d;? — s*) 
N Ns? 2Ns*‘ 
APPENDIX 
An example of the small-sample distribution of L’ 





V(L) = p> gi2m; — 





The example refers to a multinomial distribution with three classes. 
The sample size is 10, and the expectations in the three classes are 5, 3 
and 2, respectively. This example has been discussed previously, with 
respect to the ordinary x? test, by Neyman and Pearson [4]. Note that 
the expectations are fixed. It would be more revealing to work some 
examples in which the expectations are estimated from the data, but 
this is considerably more laborious. 

The probabilities of each of the 66 possible configurations of the 
sample were first computed. From these, the exact frequency distri- 
butions were worked out “or the following two linear functions: 


Ly’ = 6(f: — Mi) + 3(f2 — M2) + (fs — Ms) 
and : 
L,! = (f: — Mi) + 3(f2 — M2) + 6(f3 — M3). 


In L,’, the class with the highest expectation receives the highest 
weight and the class with the lowest expectation receives the lowest 
weight. In LZ’, these weights are reversed. It was thought that the 
normal approximation might agree better with the exact distribution 
for L,;’ than for L,’. On the null hypothesis, both LZ,’ and L,’ have 
means zero: their standard deviations, as found from equation (4), 
are 6.3953 and 6.0332, respectively. The exact probabilities and the 
normal approximations for a two-tailed test are shown in Table 6 for 
the region in which the exact probability lies between 0.25 and 0.005. 

For both LZ,’ and L,’, the normal approximation tends to under- 
estimate the probability, i.e., to give too many apparently significant 
results. Table 6 also shows the errors in the normal probabilities as 
percentages of the true probabilities. The averages of these percentage 
errors, ignoring sign, are 12 per cent for L,’ and 10 per cent for Ly’. 

Since the values of the L’”’s proceed by integers, it would be easy to 
apply a correction for continuity in calculating the normal approxima- 
tion. This correction removes the tendency of the normal approxima- 
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TABLE 6 
COMPARISON OF EXACT PROBABILITIES AND 
NORMAL APPROXIMATIONS 
L,’ L,’ 

2 
“ Deviate Exact Normal Error Exact Normal Error 
P approx. in % P approx. in % 
8 .257 211 —18 .213 .185 =i 
9 .172 .159 — .156 .136 33 
10 148 118 —20 .120 .0903 ~—25 
3. 11 .0997 .0854 —14 .0723 .0683 — 6 
3 12 .0656 .0607 =~ 7 .0563 .0467 mi 
h 13 .0549 .0422 —23 .0303 .0312 + 3 
t 14 .0289 .0286 o % .0210 .0203 - % 
. 15 0185 .0190 +3 .0138 0129 «® 
16 .0130 0123 ~ 0084 .0080 -—5 
t 17 .0064 .0079 423 0052 .0048 -8 











tion to underestimate the true probabilities: in fact the corrected 
values are mostly overestimates. Apart from this effect, the correction 
does not improve the approximation. With the correction, the average 
percentage errors are 15 per cent for Ly’ and 12 per cent for Ly’. 

Contrary to expectations, the normal approximation is not closer for 
L,' than for L,’. It is, however, closer in single-tailed tests because the 
distribution of L,;’ is not far from symmetrical, whereas that of L,’ is 
quite skew. 
! Although the user must judge for himself whether these approxima- 
} tions are satisfactory for practical use, the agreement seems surprisingly 
good when one considers that the largest expectation is 5. 

In conclusion, I wish to thank Dr. W. Kruskal and Dr. P. Meier 
for some useful suggestions. 
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THE ESTIMATION OF AN OPTIMUM 
SUBSAMPLING NUMBER* 


SaMvEL H. Brooxs 
The Johns Hopkins University 


I, INTRODUCTION 


N TWO-STAGE sampling the usual criterion for the optimum number 
| subunits to be sampled from a primary unit is that number 
which minimizes the cost variance product. It is well-known that this 
depends on the ratio of the variance within primary units to the vari- 
ance between primary units. Estimates of this ratio are often made from 
data at hand or from a pilot sample. This paper discusses the estima- 
tion of the optimum subsampling number, m.», such that when this 
estimate of m,, is used to take the main sample, the precision of the 
resulting estimate of the population mean averages at least 90 per cent 
of the precision obtainable if m,, were used. It is shown that in some 
cases an adequate subsampling number can be estimated without a 
pilot sample. 


II. ELEMENTARY THEORY 


This section, containing some of the elementary definitions and 
results of two-stage sampling theory, is included in order to have a 
unified presentation. More complete discussions can be found in 
Deming [3], Cochran [2], and Hansen, Hurwitz and Madow [5]. 


A. The subsampling number in two-stage sampling 


Two-stage sampling is often used to estimate the mean of a popula- 
tion comprised of units which contain the fundamental elements of the 
population. The first stage is the selection of a sample of units. The 
second stage is the subsampling of these units. The subsampling num- 
ber is the number of elements sampled within each of the units selected 
in the first stage. These units and elements are frequently called 
primary units and secondary units respectively. 

An agricultural study of the sugar content of a variety of sugar beets 
may serve as an example of two-stage sampling. The statistic of inter- 
est is the average per cent sugar content of the beets in a field. The 
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field is divided into a number of plots. The first stage of sampling is 
the selection of n plots to be sampled. The second stage is the sampling 
of m beets from each plot selected. 

Two-stage sampling is to be distinguished from sequential procedures 
such as two-sample sampling and double sampling in which the results 
of the first sample indicate how a second sample should be made. 
However, it is clear that two-stage sampling may be used in a two- 
sample procedure if the population has the appropriate structure. In 
fact, the pilot sample designs in this paper may be considered the first 
sample of a two-sample procedure. 

Precision and economy are the criteria in the choice of an appropri- 
ate subsampling number. If the elements vary little within a unit and 
vary much from unit to unit, it is reasonable to use a low subsampling 
number. However, if the sampling of a unit is very expensive relative 
to the sampling of an element, a large subsampling number is appropri- 
ate. 


B. The model of the population and of the cost of a sample 


Y =population mean 
= deviation of the mean of the 7th unit from the population mean 
e;;=deviation of the jth element of the ith unit from the mean of 
the ith unit 
yi; = value of the jth element of the ith unit 
M =number of elements contained in each unit, constant for all units 
in the population 
N =number of units in the population 


= Y +uzst+ Cij- (1) 


The average value of u; is zero, and within the ith unit the average 
value of e;; is zero. The variance of u; is S,?, and the variance of e;; is 
S,?. Analogous to their meanings for infinite populations these vari- 
ances are defined for finite populations by the relations: 


S,? = aan Ew Y;)?*, 


S,? + MS,2 = —— > (Y; -— Y), 
+ sh ) 


where Y; is the mean value of the 7th unit. 
Both units and elements within units are selected by random sam- 
pling, so that the variates u; and e,; are independently distributed. 
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C.=cost of selecting a single unit for subsampling 
n=number of units selected for subsampling 
C.=cost of sampling an element within a selected unit 
m=number of elements sampled within each selected unit (sub- 
sampling number) 
C’,=cost of executing the two-stage sample 


C, = Cyn + Crnm. (2) 


C. Sample mean and variance 


An unbiased estimate of the population mean per element is the 
simple average of all the elements in the sample, 


l n m 
Y=— DD wi. 
im] jul 
The variance of this estimate is 


v@ = (—- =) s+ + (— - 


N nm 


An unbiased estimate of V(y) from the sample is 


1 ( 1 1 ) 24 1 ( 1 l ) , 
s a Sw 
m\n N ; N\n M 
where s;? is the between units mean square and s,,? is the within units 


mean square from the sample as computed in an analysis of variance 
table. 


D. The optimum subsampling number 


The value of m is here found which will minimize the variance of the 
sample estimate of the population mean when the cost of the two- 
stage sample is fixed. From equations (2) and (3), 
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V%) = — (: + ma +m + a “) 
C. WC. Sz? m C, S,? 
~ (1 + 2 ). (4) 
N M S,? 
By setting the derivative of V(¥) with respect to m equal to zero, 


Mop, the value of m which minimizes the variance of the estimate of 
the mean when the sample cost is fixed, is found: 


dV()_ CS.? (1 . 1 Cy =) 


dm id m? C. S,? 


Eliminating 1/n, 


=- 
— 





The optimum subsampling number is 


C. Se (5) 
Mo => a 
4 C. Su 


If V(7%) is fixed, equation (4) may be solved for C,, 


C. c. . & 1 C. S,* 





: VW , 1 (: + = C. | Sut m Ce Sy? 
S2 | N M 8.2 


It may be seen that the variable m appears in equation (6) in the 
sare manner as in equation (4), so that the m,, of equation (5) is also 
the value of m which will minimize the cost of the sample if the vari- 
ance of the estimate of the mean is fixed, thus it minimizes the cost 
variance product. The quantitative expression for m., in equation (5) 
is consistent with the criteria in section II A. 

The corresponding number of units to be sampled, n, is obtained 
from equation (2) if the cost of the sample is fixed, or from equation 
(3) if the variance of the estimate of the mean is fixed. 

If M, N, Cu, C., Su?, and S,,?, are well known, and m,, is used, then 
the cost of a two-stage sample which will yield an estimate of the mean 
having the desired or target variance T(¥) is 


= LV Cu Su + VCe Sw)? 





1 1 
T = prae (s. «aiitie s.*) 
(9) + y + 7, 
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E. Estimation of mop 


The optimum subsampling number may be estimated from a pilot 
sample or from an available array of data which may be considered 
as a pilot sample. Let h be the number of units selected for sampling 
in the first stage of the pilot sample, and let k be the subsampling 
number used in the second stage of the pilot sample. The numbers h 
and k will be used as the dimensions of the pilot sample, and the 
numbers n and m will be the corresponding dimensions of the subse- 
quent main sample. It is assumed that the cost ratio C,/C, is known 
exactly. That this assumption is not critical will be shown later. Thus, 
the estimate of m,, is dependent simply on the estimate of the variance 
ratio, S,?/S,? which is obtained from the analysis of variance table 
resulting from the pilot sample: 








Mean Expected 

Degrees of Squares Value of 
Freedom from Mean 

Sample Squares 


Source of Variation 





Between units h-1 82? S2=S.2+kS,? 
Within units between elements h(k—-1) Su" S.? 





It may be seen from the expected values that 
Sy S,? 1/2 k 





Se (S32 Su?) ae 
k b w Su? 


So that the estimate of S,,/S, from the sample is 
k 1/2 
8,” 
wn 


Su* 


The estimate of the optimum subsampling number, which shall be 
denoted by ip, is obtained from equation (5) by the direct substitution 
of the above estimate, 


1/2 
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Ordinarily, an integral value of m., is required. The estimate, for- 
mula (7), does not yield such a value in general. The procedure for 
selecting the subsampling number in this case is to use the integer m 
which satisfies the relation! 


m(m — 1) S Mop? < m(m + 1). 


As an example, suppose that 77.» =7.50 has been obtained from equa- 
tion (7). Since 8 (7) S$ (7.50)? <8(9), it may be seen that the subsampling 
number to be used is m=8. It is of interest to note that m= is 
indicated whenever s,7/s.?<1, implying that each unit in the sample 
should have all its elements enumerated. 

It is conceivable that a non-integral value might be used for the sub- 
sampling number. This effect may be achieved by using one subsam- 
pling number on some proportion of the units and another subsampling 
number on the rest of the units which are in the sample. However, 
such efforts would result in a negligible gain in precision which would 
be offset by the necessary bookkeeping. 


III. THE EXPECTED RELATIVE PRECISION OF A PILOT SAMPLE DESIGN 
A. The relative precision of a subsampling number 


We shall define the relative precision, RP, of a subsampling number 
m as the ratio of the variance of 7 given by m., to the variance of y given 
by m for the same sample cost. If the finite population correction can 
be ignored—that is, if N is large relative to n—then from equation (4) 


CS,” (= 4 Sw? 1 Cy Sw? ) 
Cc s8&2 op C. S,? 
C. 


eOu (=+ S.* 4 , &,* ) 
C im S,? ™m C. S,? 


Substitution for m., from equation (5) gives 





Ce, Sey 
C.° Sv C. Su 


Co Su? 1 Cy Se? 
ZF +g tes ; 





RP(m) = 


Cu 
C. 


™m 





1 Cameron [1] indicates that this procedure was originated by Churchill Eisenhart. 
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B. Distribution of the estimate of the optimum subsampling number 


If u; and e;; may be considered normally distributed, the distribution 
of s,? is S.2x? with h(k—1) degrees of freedom and s;? is distributed 
as (Su?+kS.2)x? with (h—1) degrees of freedom. The ratio 

S,? 
3. 1+k 3. 








Su? + kS.? Su? 
is distributed as F where F has h(k—1) and (h—1) degrees of freedom. 
From equation (7) 
ball =1+ . C. ° 
ae" M.,* C, 
By substitution the random variable m., can be related to F, 


deg © 





Mop” c. 


For computational convenience this may expressed in terms of the 
incomplete 6-function, using Kar] Pearson’s notation. The parameters 
are 


h(k — 1) 
‘= : ; 


k-1 
ean nie 


2 
The random variable is 
pF 
~ pF +q_ 
C. Expected relative precision of-a pilot sample design 


L(Mop) 


We shall consider a pilot sample design to be the specification of h, 
the number of units to be selected for sampling, and of k, the sub- 
sampling number. The units and elements are to be selected randomly 





OPTIMUM SUBSAMPLING NUMBER 405 


in the pilot sample as well as in the main sample. For the resultant 
main sample design the subsampling number m is obtained from the 
pilot sample as indicated in Section II E, and n, the number of units 
to be selected for subsampling, is found from equation (2) since it is 
assumed that the cost of the main sample has been fixed. We shall 
define the expected relative precision of a pilot sample design as being 
the average relative precision of m’s which result from the use of that 
design for fixed cost and variance ratios. 

The relative frequency, f(m), with which the integer m is indicated 
as the subsampling number by a pilot sample design is the relative 
frequency that the design yields an estimate of the subsampling num- 
ber, Mop, equation (7), which is between »/m(m—1) and \/m(m-+1), 

1 z(Vm(m+1)) 


f(m) = 2?—1(1 — x)*dz. 
B(p, @ J 2 Vmm—) 


Since this m has the relative precision given by equation (8), the 
expected relative precision, ERP, of the pilot sample design is 


ERP = > RP (m)f(m). (9) 


m= 


C. S.* 
C. S82 


is greater than 16 and M is large, this summation is well approximated 
by the integral 


ERP = of RP(m(z))x?-"(1 — 2)—1dz (10) 
oe 


- 


Cu 
k— 
C. 


p (- - *) ( =) ' 
Lg x Sy? ol 

It may be seen that the expected relative precision of a pilot sample 
design is determined by h, k, M, C./C. and S,?/S,?. 





m(z) = 
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IV. PILOT SAMPLE DESIGNS HAVING AN EXPECTED RELATIVE 
PRECISION OF 90 PER CENT 


A. Construction of Table I 


The pilot sample designs of Table I were found by using equations 
(9) and (10). For fixed cost and variance ratios and M=., the ex- 
pected relative precisions of combinations of h and k were computed. 
For each h considered, the value of k for which ERP =90 per cent was 
found by inverse interpolation. Of these, the combination of h and k 
which forms the least expensive pilot sample was taken as the “opti- 
mum” design. 

Preliminary work had indicated that the expected relative precision 
of a pilot sample design was relatively insensitive to the cost ratio. 
Therefore, the values of C./C. which were considered in the calcula- 
tions were rather widely spaced. They were: 0.01, 1.0, 16, and 100. 

The variance ratios considered in the calculations were S,,?/S,? =0.25, 
1.0, 4.0, 16, and 64. It is clear from Section V that when the variance 
ratio is known to be small a subsampling number with high relative 
precision can be determined without resorting to a pilot sample. Pilot 
sample designs corresponding to variance ratios of 0.5 or less were 
included in Table I, but the reader is advised to consider Section V 
carefully if he feels that the variance ratio of the population he intends 
to study is as small as this. 

The pilot sample designs of Table I were interpolated from the 
designs computed for the cost and variance ratios indicated above. 
Only one value of the parameter M, the number of elements contained 
in each unit, was considered. The pilot sample designs of Table I 
correspond to the value M= 0, 

For each combination of the above parameters, equation (9) or (10) 
was used to determine the expected relative precision of a number of 
pilot sample designs. For each h considered, the ERP was computed 
for several values of k. By graphical interpolation the value of k was 
found which corresponded to ERP =90 per cent. By this means, sev- 
eral pilot sample designs having ERP=90 per cent were found for 
each combination of the cost and variance ratios. 

It was then pertinent to consider which of these pilot sample designs 
would cost the least to execute. The assumption was made that the 
cost ratio for the pilot sample is the same as for the main sample. This 
does not imply that the per unit and per element costs in the pilot 
sample must be the same as those in the main sample, but does imply 
a proportionality between these costs. If this constant of propor- 
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tionality is r then, analogous to equation (2), the cost of a pilot sample 
is 


Cy = rCuh + rChk. 


For comparison purposes it is sufficient to consider a pilot sample 
cost function, C,, which is proportionate to C,, 


Cy Cc. 
C.= =h}—+ k]. ll 
ails (11 
By using equation (11) the relative costs of executing the pilot sample 
designs can be compared. As an example, consider the designs that 
were found to have ERP=90 per cent for C./C,=16 and S,,?/S,? = 64, 








k C, 





333 1,745 
171 1,309 
118 1,206 
93 1,199 
78 1,222 
68 1,260 





Cis plotted against h. C, has a minimum value of 1,198 near h=10. 
The corresponding value of k is found by substituting these values in 
equation (11), 


1198 = 10(16 + &). 


The integer k=104 satisfies this relation. Therefore, the “optimum” 
sample design in this case is h=10, k= 104. 


B. Application of Table I 


Table I contains designs of pilot samples to be used to estimate the 
optimum subsampling number such that, when this estimate is used 
in the main sample, it will have a relative precision of 90 per cent on 
the average. 

A population for which the designs of Table I are applicable fits the 
model described in Section II B. It is to be comprised of an infinite 
number of units each containing an infinite number of elements, that 
is, N=M=.., Furthermore, u; and e;; are to be normally distributed. 
If the population does not fit the assumption that N and M both be 
very large, then the designs of Table I are conservative in the sense 
that when applied to such a population, these designs would have an 
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expected relative precision of greater than 90 per cent. It is also assumed 
that the cost ratio is the same for the pilot sample as that for the main 
sample, but moderate deviation from this assumption has little effect. 

The fundamental assumption of an equal number of elements in 
each unit should be carefully considered. Failure to have units of 
nearly equal size, say within 5 per cent of their average, may cause a 
greater loss of precision. In many cases this difficulty may be allevi- 
ated by carefully redefining the unit. For example, the block is a con- 
venient unit in sampling the households of a city for some character- 
istic. However, in sparsely populated regions of the city several blocks 
may be treated as one unit, and in densely populated regions a part of 
a city block is considered to be the unit. By this means units of ap- 
proximately equal size might be constructed. 

A cost ratio and a variance ratio must be estimated in order to select 
a pilot sample design from Table I. The cost ratio may be obtained 
from the experience of previous investigations or might be estimated 
from a knowledge of the nature of the proposed investigation. Even a 
rather crude approximation to the cost ratio could be used without 
much reduction in the expected relative precision. The value of the 
variance ratio to be used in entering Table I is the highest such ratio 
believed possible in the population to be sampled. Thus, if this ex- 
treme estimate is in fact the “true” variance ratio of the population, 
the pilot sample design will have an expected relative precision of 90 
per cent. If this estimate is greater than the “true” variance ratio, 
the expected relative precision is greater than 90 per cent. 


C. An example of the use of Table I 


In the manufacture of printed cardboard cartons such as breakfast 
cereal boxes, the moisture content of the cardboard is an important 
factor affecting the alignment of the printed colors. Therefore, one 
measure of the quality of a shipment of cardboard received from a 
supplier by a printing concern is the average moisture content of the 
cardboard in that shipment. One such concern, the Lord Baltimore 
Press in Baltimore, Maryland, has recently developed a sampling 
procedure to make such determinations. 

A shipment of cardboard is well suited to two-stage sampling. The 
cardboard is delivered in 2,000 sheet packages called skids. A single 
shipment may consist of 300 skids. A skid may be regarded as a unit 
and the measurements of the moisture content of the sheets within a 
skid may be regarded as the elements of that unit. 

In order to sample from this type of population, an estimate of the 
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optimum subsampling number is needed. To make this estimate, a 
pilot sample design was selected from Table I on the basis of estimates 
of the cost ratio and of an upper bound to the possible values of the 
variance ratio. It takes about six minutes to move and open a skid 
and about three minutes to make a moisture content determination, 
so that an estimate of the cost ratio, C./C., is 6/3 =2. It was not believed 
that the variation of moisture content within a skid would be more 
than twice the variation of moisture content from skid to skid, so that 
an estimate of an upper bound to the variance ratio, S,,?/S,?, is 2. From 
these parameters, Table I indicates that the appropriate pilot sample 
should consist of k=7 moisture content determinations to be made at 
random from each of h=7 skids sampled at random from the shipment. 
The results of the pilot sample are presented in Table II. 


TABLE II 


MEASUREMENTS OF PERCENTAGE MOISTURE CONTENT 
IN A SHIPMENT OF 300 SKIDS 








Within Skid 
Measure- 
ment 
Number 


Skid Number (Coded) 





to 
J 


116 231 235 





NAO we 
CPW DMDon 
CwomnD 
MNwWANanNA 
AAIBVAS 


Skid Totals 45. 


cs 
“I 





The analysis of variance is computed in the usual manner: 








Degrees of Sum of Mean 


Source of Variation Sienadiom. Squares Squares 





Between Skids 6 4.2589 0.7098 =s,? 
Between Measurements 

within Skids 42 1.9486 0.0464 =s,,2 
Total 48 6.2075 
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T'rom equation (7) the estimate of the optimum subsampling num- 
ber is 


(7)(2) >? 
0.7098 
0.0464 


Since 1(1—1) $ (0.990)? <1(1+1), it may be seen from Section II E 
that the subsampling number to be used is m=1. 


= 0.990. 


V. THE RANGE OF VARIANCE RATIOS FOR WHICH A PARTICULAR 
SUBSAMPLING NUMBER IS ACCEPTABLE 


A particular subsampling number, m,, having been selected, it is of 
interest to know for what interval of variance ratios m, will have an 
acceptable relative precision when the cost ratio is fixed. We shall 
designate the least acceptable RP(m.) as L. The limits of this interval 
of variance ratios is found from equation (8) by solving for S,,? S,? in 
terms of C,/C., L, and ma, 


etl | 











When the least acceptable relative precision of m, is 90 per cent, that 
is, L=.90, these limits are given by 


- Cu 3 [ +] 
— — m ——_ 
Vc.* Tm on a 


9 C, 


——-1) 





(12) 








Table III was computed from equation (12). For several cost ratios it 
indicates the interval of the variance ratio over which the subsampling 
number tabulated will have a relative precision of at least 90 per cent. 
In terms of the example in the previous section in which the cost ratio 
is C,/C.=2 and the subsampling number selected is m,-=1, Table III 
shows that this subsampling number will have a relative precision of at 
least 90 per cent if, in fact, the true variance ratio is between 0.0 and 
1.9. The estimate of the variance ratio from the pilot sample is 0.49. 

It is possible to find a moderate subsampling number which has 
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good relative precision for high variance ratios. The least subsampling 
number which will have a relative precision LZ for an infinite variance 


ratio is 
[ L |; 
m = | ————|]— - 


This subsampling number also has relative precision L for variance 


ratios as small as 
[S _ | C. 
(1 vie L) c 
For example, when L =.90, the subsampling number 9C,,/C, will have 
a relative precision of at least 90 per cent for all variance ratios greater 
than 16C./C,. By this means a subsampling number having acceptable 
relative precision can be selected when it is known that the variance 
ratio is high. 
It is also possible to find a subsampling number having good relative 


precision over a large interval of small variance ratios. The greatest sub- 
sampling number having relative precision L for a zero variance ratio 


is 
E ~ —"] Cu 

m = — 

L C. 


if this quantity is greater than one. In this case, this subsampling 
number also has relative precision L for variance ratios as large as 


[a |S: 
(L — 0.5)j C, 


For example, when L=.90 and (C,/C.) 29, the subsampling number 


"-[le 


will have a relative precision of at least 90 per cent for all variance ratios 


less than 
| 1 | Ce 
16.) C, 


Po le 
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is less than one, the subsampling number m=1 may be considered to 
have at least relative precision L for the zero variance ratio, since this 
is the least subsampling number possible. Thus, a subsampling number 
having acceptable relative precision can be selected when it is known 
that the variance ratio is low. 

It may be inferred from this discussion and from Table III that a 
subsampling number having high relative precision may be estimated 
without a pilot sample if the order of magnitude of the variance ratio 
is known, or if the variance ratio is known to be quite large or quite 
small. 


VI. OTHER PROBLEMS CONCERNING THE USE OF A PILOT SAMPLE IN 
THE DESIGN OF A TWO-STAGE SAMPLE 


This paper has indicated methods of designing two-stage samples in 
several situations. When the cost and variance parameters are well 
known, the results of the elementary theory reviewed in Section II 
may be used. When the cost of the main sample has been fixed and the 
cost and variance ratios are known roughly, the results of Section V 
indicate the adequacy of a subsampling number that has been inferred 
without benefit of a pilot sample. When the cost of the main sample 
has been fixed and an upper limit to the variance ratio may be as- 
serted, a pilot sample design can be selected from Table I to be used 
to estimate the optimum subsampling number. When a subsampling 
number has been determined and the cost of the main sample has 
been fixed, the number of units to be sampled is found from equa- 
tion (2). 

There are other situations, however, in which the investigator would 
like to obtain an estimate of the population mean having a specified 
variance, this to be done at least cost. Here the pilot sample would be 
used to estimate the variance components themselves, not just their 
ratio, and these variance components are used to estimate the number 
of units to be sampled as well as the subsampling number. The ade- 
quacy of such a design indicated by the pilot sample may be judged in 
terms of how near the actual variance of this design is to the desired 
variance and how low the cost of executing this design is. 

The pilot sample may also be used to estimate the population mean 
and to furnish information as to how large an additional sample need 
be to provide a combined estimate of the population mean having a 
specified variance. In this situation pilot sample designs that are 
optimum in any reasonable sense seem quite difficult to construct. 
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EXTENDED TABLES FOR USE WITH THE ‘G” TEST 
FOR MEANS 


J. EpwarD JACKSON AND ELEANOR L. Ross 
Eastman Kodak Company 


This paper uses existing statistical theory to transform the 
tables of Lord into a form such that the significance tests may 
be made quickly without the aid of either a desk calculator or 
a slide rule. 


INTRODUCTION 


HERE has been a trend in the past few years towards the use of 
Trevia approximate statistical methods. These methods are often 
somewhat inefficient but this loss in efficiency is usually more than 
compensated by the amount of time and effort saved in computing 
the results. Such a case is the “G” test for means as an approximation 
to the one-sample “t” test of the form 


_ | X¥-al 
‘ ? 


vn 


t 


X = Sample Mean 
u = Hypothetical Mean 


\/— — (2X) 
n(n — 1) 


n = Sample Size. 








The alternative “G” test replaces s/+/n by R, the sample range, and is 
of the form 
1X al 

R 


The distribution of this statistic was found by Daly [1] in 1946. 
Lord [3] proposed a second alternative called the “u” test which is 
of the form 


Oa 





1 the 
may 
or or 


e of 
‘ten 
han 
‘ing 
‘ion 


| is 
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where dz is the ratio of the expected value of the range to the population 


' standard deviation which was tabulated by Tippett [4] and can be 
found tabulated in any quality control handbook. This form could 


also be used if the sample were divided into random sub-groups of 
equal size. The statistic is then 


_1X-al 
R 
dev/nm 


where R is the average range of m sub-groups of n each. 
Lord [3] also proposed a two-sample test of the form 


|X, — X2| dp 
“= ee F 
~ 1 1 
RA/ —+— 
nm, nNMe 


U ’ 








where the sample from which X was obtained is made up of m; sub- 
groups of n each and the sample from which X2 was obtained is made 


of mz subgroups, also of n each. R is the average range of the combined 


samples. The power of these tests has been discussed by Lord [2]. 

Use of Tables: All u-tests may be converted into simpler G-tests by 
suitable transformations. The use of such G-tests effects a considerable 
saving in computing time, particularly for the two-sample test. 

The significance of the difference between a sample mean X and a 
hypothetical value » may be determined using a G-test in the form 





X - u 
g, «1% =*l 8 
R do\/nm 
Table I contains percentage points of the distribution of G,, for signifi- 
cance levels a=.10, .05, and .01; n=2, 3, ---, 15; m=1,2,---, 15. 


The special case m=1, with a single range RF in a sample of n values, 


is the original G-test of Daly. 
The G-test for assessing the significance of the difference between 


two sample means takes the form 


1 1 
aan nn u + 
| Xi = X2| W/m nme 


G, = ual o 


R dy 











Table II, in a quadruple entry form, gives the percentage points of the 
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TABLE I 


PERCENTAGE POINTS FOR “G” TEST WHEN 
COMPARING A SAMPLE MEAN TO A 
HYPOTHETICAL MEAN 


G,; ™ |X —p| 


R 
m =number of subgroups 
n =subgroup size 
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TABLE II 


PERCENTAGE POINTS FOR “‘G” TEST WHEN COMPARING 
INDEPENDENT SAMPLE MEANS 


G.=|Xi —X;| 


R 
m, =number of subgroups in sample from which X; was obtained 
m, =number of subgroups in sample from which X: was obtained 
n =subgroup size 
These tables are symmetric. 


Subgroup size n = 2 
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TABLE II—Continued 
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Subgroup size n= 4 
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Subgroup size n = 6 
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Subgroup size n = 7 
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Subgroup size n = 8 
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01 34 34 34 34 +34 35 035 
mn 
15 14 13 12 11 10 9 . 


my 
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TABLE II—Continued 


Subgroup Size n 29 
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TABLE II—Continued 


Subgroup size n = 10 


4 


5 


6 








219 
23 
231 


015 
18 
24 


219 
222 
+30 


1k 
+17 
+23 


+12 


015 
+20 


ell 
014 
18 
ell 


013 
+17 


01S 
+22 
030 


14 
017 
222 


212 
14 
219 


ell 
013 
017 


210 
212 
+16 


+10 
012 
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TABLE II—Continued 


Subgroup size n = 11 





Mm, 
1 2 2 4 5 6 ? 8 
-10 +23 20 -18 18 017 17 017 217 
05 -28 24 022 oi -21 20 20 20 














01 110 “11 elt oil “11 ot 012 
-10 -07 07 .07 .07 .07 -07 .08 
05 08 08 .08 09 09 09 09 
01 «82 oll oft ell eit 012 012 
10 07 07 .07 07 .08 -08 -08 
05 209 09 +09 09 09 09 09 
01 ell ell 012 012 012 012 +13 
210 -08 08 -08 08 -08 .08 08 
.05 -09 09 09 09 09 +10 210 
01 012 012 012 12 13 +13 013 
10 -08 08 08 .08 .08 09 209 
205 10 10 %10 10 +10 -10 10 
01 13 +13 13 13 13 14 014 
-10 209 209 209 .09 209 209 .09 
205 oil o88 oil oil oil «82 oil 
01 14 14 214 214 214 215 015 
+10 10 «10 210 10 10 10 210 
.05 012 12 012 012 012 012 013 
01 016 016 016 016 16 216 oi? 
210 012 012 012 012 012 012 012 
05 014 014 14 14 14 015 015 
01 +19 +19 +19 19 +19 019 19 
10 16 016 +16 +16 +16 017 017 
05 019 019 20 20 +20 20 +20 
01 26 026 26 +26 +26 +26 26 

15 14 13 12° 11 10 9 


ms 
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TABLE II—Continued 


Subgroup Size n = 12 
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017 
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027 
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TABLE II—Continued 


Subgroup Size n # 13 
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TABLE II—Continued 


Subgroup Size n = 14 
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18 
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205 
01 
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210 
205 
01 


210 
205 
01 


210 
205 
-01 


210 
205 
01 


+10 
205 
01 


210 
205 
-O1 


210 
-05 
-O1 


+10 
05 
01 


10 
205 
201 


-10 


05 
201 


210 
205 
201 


+10 
-05 
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210 


05 
201 
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TABLE II—Continued 


Subgroup Size n = 15 




















my 
1 2 3 4 5 6 Z 
018 015 14 014 014 213 013 213 
+22 18 017 017 016 16 +16 016 
+29 025 223 +22 221 221 021 221 
212 oil ell 210 210 210 210 
015 14 013 212 212 212 212 
220 18 017 16 +16 16 015 
04 210 +09 09 -09 209 -08 
205 212 oil ell -10 -10 -10 
207 016 015 14 14 013 13 
205 205 +09 -08 -08 08 08 
205 -06 210 210 210 209 209 
207 207 
205 205 
-06 -06 
-07 -07 
205 205 
-06 -06 
-07 -08 
05 205 
06 -06 
-08 08 
205 -05 
-06 -06 
-08 -08 
205 205 
06 -06 
-08 -08 
05 05 -06 206 06 206 206 ™ 
-06 207 .07 -07 07 -07 207 § 
-08 09 09 209 209 -09 -09 
06 06 06 =. 06 206 06 -06 
-07 -07 07 207 -07 .07 07 7 
09 09 09 209 209 210 210 
206 -06 06 06 06 06 06 
107 .07 .07 1.07 107 «©.08-)~S (iw | CG 
209 09 210 210 «10 10 10 
206 06 -06 -07 207 -07 207 
08 08 08 ..08 .08 08 -08 5 
+10 -10 210 210 210 ell oil 
-07 .07 207 207 207 -07 207 
08 08 08 08 209 209 209 4 
ell ell il ell ell ell 212 
08 08 -08 08 08 08 08 
209 209 209 209 210 210 10 3 
212 012 212 212 213 13 213 
209 209 209 209 209 210 210 
ell ell oil eil ell ell eil 2 
015 015 215 045 015 015 015 
213 013 213 213 013 +13 013 
015 015 015 «15 015 015 16 4 
+20 220 20 -20 220 220 +20 
15 14 13 12 11 10 9 2 
my 
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distribution of G, for a=.10, .05 and .01; n=2,3,---, 15; mand 
m=1, 2, CP 15. 

Tables I and II were computed directly from Lord’s tables of the 
percentage points of the distribution of u, making the transformations 
indicated above. For certain values of m which were not included in the 
original tables, the corresponding percentage points for u were ob- 
tained using a five-point Lagrangian interpolation formula as suggested 
by Lord [3]. 

These tests are of the two-tail variety. The significance levels to be 
used with one-tail tests are a/2. In applications of the G-test the usual 
assumptions are made concerning normality, homogeneity of variances 
and randomness in the distribution of sample values among the sub- 
groups. 
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THE PROPERTIES OF THE MEAN SQUARE SUCCESSIVE 
DIFFERENCE IN SAMPLES FROM VARIOUS 
POPULATIONS 


P. G. Moore 
University College, London and Princeton University 


This paper is concerned with the use of the mean square 
successive difference as a method of estimating variance and 
testing for homogeneity in samples from various non-normal 
populations where the order of the observations can be re- 
corded. It is shown that the relative efficiency of such esti- 
mates of variance increases as the kurtosis of the population 
sampled increases and any trend effect is virtually eliminated. 
The actual distribution varies but can be approximately dealt 
with by means of a Pearson Type III curve with the correct 
moments. 


1. INTRODUCTION 


N ALL the previous work on the properties and uses of the mean 
| square successive difference in random samples from some parent 
population the assumption has been made that the sampling is from a 
normal parent population. In this paper we consider the case of 
sampling from non-normal populations and discuss the use of the 
mean square successive difference both as an estimator of the popula- 
tion variance in certain circumstances and also as a test for the inde- 
pendence or dependence of a sequence of observations. Although 
analytic expressions are not obtained for the distribution, its form 
when the sampling is from 


a) Normal population, 

b) Rectangular population, 

c) Double exponential population, 
d) Pearson Type III population, 


is discussed in general terms from the standpoint of moments and 
suggestions made as to the best method of obtaining a required signifi- 
cance point. The relationship between the forms of the distributions of 
this statistic and the sample squared standard deviation is also con- 
sidered and finally a transformation which may be of considerable 
value in obtaining significance points to the generally required ac- 
curacy is suggested. 


2, HISTORICAL 


The usual method for estimating the variance of a population on the 
434 
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basis of a sample is to use the sample variance 





1 n 
3? = >» (2: — #)? (2.1) 
nm— 1 iat 
where 2, %2 - - - 2, are the sample observations. (2.1) is an unbiased 


estimator and although other estimators such as the mean deviation 
or sample range are sometimes used it is generally regarded as the basic 
method. Sometimes we have the situation where a gradual shift in the 
mean takes place as the sampling progresses and yet we still want the 
actual value of the variance of a single determination. An example 
occurs in ballistics and weapon testing where it is almost impossible 
because of atmospheric conditions to get two successive observations 
under exactly the same conditions yet for the calculation of range 
tables and so forth it is necessary to know the variance to be expected 
amongst a group of projectiles fired under exactly similar conditions. 
Situations will also occur, though, where this does not apply and it 
is desired to take any gradual shift into the variability measure and 
thus obtain an over-all picture of the variability. A consumer buying 
large batches of coal would perhaps be more interested in the vari- 
ability of the whole batch than in small parts of the batch from the 
point of view of fixing the conditions under which the coal is being 
used. 

If instead of using the estimators mentioned above we use some form 
of estimator based on successive differences, we will considerably re- 
duce any errors due to a trend effect. Vallier [15] estimated the disper- 
sion from successive differences and Cranz and Becker [3] used the 
mean successive difference 


1 n—1 
d=——_ > | in — 2 | (2.2) 
n— 1 ini 
to estimate the dispersion where 2, t2 --~- 2, are the observations in 


their temporal order. Kamat [8] has discussed the statistic d in some 
detail for the case of samples from a normal population. He has de- 
rived the first four moments of d and d/s and discussed the distributions 
of the ratios d/s and d/o, giving some percentage points. The statistic 
d/s has value in quality control work where it can be used to detect a 
trend; its value being lower in this case than it would be in sampling 
from a homogeneous population. Keen and Page [9] have discussed 
the use of d in this connection and shown its great potentialities due to 
the simplicity of application. 
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An alternative statistic ics 


1 n—1 
Se 

oyrar 2 (2x; i+1) (2.3) 
where the z’s are in temporal order as before. The statistic was investi- 
gated by Von Neumann, Kent, Bellinson, and Hart [11] for the case 
of samples from a normal population and they discuss the form of the 
distribution of 5? and its use as an estimator of variance. The ratio 
5*/s? was further investigated by Williams [16], Von Neumann [10], 
Hart and Von Neumann [7], and Hart [6]. The last paper gives some 
significance points for 6°/s? in samples from a normal population. 

Guest [5] has compared the efficiencies of various statistics using 
mean differences and mean square differences to estimate the popula- 
tion variance. He demonstrates that if the observations are drawn 
from a normal population the asymptotic efficiencies of the statistics 
d and 6, as the number of observations in the sample gets large, are 
60.5 per cent and 66.7 per cent respectively where the sample variance 
estimate has, of course, an efficiency of 100 per cent. 

In this paper we are going to discuss the use of 5 to estimate the vari- 
ance of the parent population and to compare it with two alternative 
estimators. We will then examine the form of the distribution of 6 in 
samples from various populations and how approximate significance 
points might be obtained for the distribution. 


§2 = 





3. MOMENTS OF 6? 


The statistic that we consider is 


1 n—1 
Dd (ries — 2)”. 


n—1 je 





52 = 


Let u, be the moments (r=2, 3, - - - ,) about the mean of the sampled 
population, and y,’ the first moment about the origin. 
Now 


n—1 
(n — 1)8 = DO (rigs — 25)? 
i=l 
n—1 
= ae (Tint — wn’ — Zi — mn’)? 


i=] 





n—1 
= >> (Xian - X;)?, where X; = eae py’ 
ae n—1 
=2 > X¥2—X2— X,2 —2 >) XX. (3.1) 


i=1 t=1 
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We now find the moments of (3.1) assuming that all the z’s are inde- 
pendent. By taking expectations of (3.1) we have 


n n—1 
(n — 1)m'(8) = E [2 Ses ~- Xo-X2-9 3, X.Xus| (3.1) 
t=] t=] 


= 2(n = 1) ue. 


Similarly 


n n—1 r 
(mn — 1)u,"(8?) -E|2 > x2 —- Xi - X.2-2>> X.Xus | . (3.2) 


t=1 t=1 


The right hand side of (3.2) may be expanded and expectations taken 
term by term of the resulting expressions. The crude moments may now 
be converted into moments about the mean giving 


(n—1) pr’ (6?) = (2n—1) pe, 

(n—1)?u2(5?) =2(2n —3) wa +22’, 

(n—1)*y3(5*) =2(4n —7) ue +6(4n —5) paue 
~4(8n—11)u:*—8(7n—13) s?, 

(n—1)4u4(5?) = 2(8n15) ug +8(16 —27) woue+2(24n?+8n— 101) u,?, 
+48( —13n+24) wspo?+ (336 — 624) uo! 
+32(—16n +33) usu3+32(25n — 154) use. (3.3) 


We can apply a check to these results for the case of samples from a 
normal population for which we have 


Ma = 3u2?; we = 15y2®; ws = 105y0*; = ws = ws = O; 
and our moments (3.3) reduce to 
(n — 1)yy’(6?) = 2(n — 1)us, 
(mn — 1)*u2(6?) = 4(38n — 4) ye’, 
(mn — 1)8us(5?) = 32(5n — 8) ys’, 
(n — 1)4u4(5?) = 48(9n? + 46n — 112) p04 (3.4) 


and these values agree with those given in [11]. 
At this stage we will give the moments of (2.1) which are used later. 
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They have been given in a slightly different form by Church [2]. 


u1'(8") = ma, 


ue 2(38n?—6n-+5) 3(n—5) 2(n?—12n+15) 
w(o!) =— wn—1)? — Mae n=)? ua, 
us 4(n—7) 8(3n?—6n+7) 
n® n3(n—1) i n3(n—1)? 
3n‘4— 12n?+42n?—60n+35 


n(n —1)8 














Mes 





Ma 


3 2 
aK | —2n4+-42n*—294n?+630n — 420} 
n*(n—1)8 


16372 
tomo? {6n*—27n?+50n—35} 


4 
ae { 3n4—27n?+279n?—765n+630}. (3.5) 
n“\n— 


Finally we will consider a third alternative estimator 





1 m 
n? = om 7 (ex = Xi-1)’, (3.6) 
i=1 


where n=2m, that is we are considering the case of n even. The mo- 
ments may be written down straightforwardly as 


pa’(n?) = pe, 


Me ba* 
Q 2 + —, 
po(n?) ae 
Ouse Sys” Se? 
4m? 2m? 2m? 


Ms Seite 7 sus 4 3m + 13 . 3m — 27 





M6 
us(n?) = bn? + 





he ee = pace 0 2 
pa(n?) ant * Ome a = + re 
10 3m + 21 
+ — ps?u2 + ———— ut. (3.7) 
m$ 4m? 
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4. EFFICIENCY OF ESTIMATORS 


In ordinary work we use the sample variance as our estimator of the 
population variance and this estimator will be taken as the standard 
although it must be realized that for many populations the sample 
variance may not be the best possible estimator. Our measure of rela- 
tive efficiency will be taken as the ratio of the variances of the statistics 
which are unbiased estimators of the population variance. Hence we 
have 


Variance of s? 





Relative efficiency of = 
aj Variance of $6? 


_ 2a — 1% — (n — 1)(n — 3)" 
n{(2n — 3)us + w2?} 

7 2 (n — 1)*B. — (n — 1)(n — 3) 

on (2n — 3)B2 +1 








(4.1) 


where B2= 4/2”. We notice that for large n the efficiency tends to the 
value (82—1)/B2 and the greater the value of f. the larger the asymp- 
totic relative efficiency of 6?. Usually as n increases for a given #2 the 
efficiency drops away to its limit but for large 62 it drops to a minimum 
and then increases slightly to the limiting value. For any given n the 
efficiency increases with 82. Table 1 gives some specimen values for this 
efficiency. We note that 82 equal to three is the normal population case 
and that 6: must always be greater than unity (see, for example, 
Shohat [14]). 


TABLE 1 
RELATIVE EFFICIENCY OF # TO s* 














Bs 

1 2 3 4 5 6 10 rs 

5 0.40 0.64 0.73 O.77 0.80 0.82 0.95 0.91 

10 0.20 0.57 0.69 0.76 0.80 0.82 0.87 0.95 

n 15 0.13 0.54 0.68 0.75 0.80 0.82 0.88 0.97 
20 0.10 0.53 0.68 0.75 0.80 0.83 0.89 0.98 

25 0.08 0.53 0.68 0.75 0.80 0.83 0.89 0.98 

0 0.50 0.67 0.75 0.80 0.83 0.90 1.00 
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The statistic y? has a relative efficiency, compared with 6°, given by 
the expression 


2n — 3 1 
Relative efficiency of n? to 6? = : = he + » (4.2) 


(n—1)? 2(B2 + 1) 





where 
n = 2m. 


Some specimen values of this efficiency are given in Table 2. We note 
that as n gets large the efficiency approaches $2/(82+-1) which shows 
that it never quite becomes as efficient an estimate as 6. 

















TABLE 2 
RELATIVE EFFICIENCY OF 7? TO 8, AND IN PARENTHESES, 
7? TO s? 
Be 
1 1.5 3 5 8 
6 | 0.60 (0.17) 0.70 (0.36) 0.84 (0.60) 0.92 (0.73) 0.98 (0.82) 
10 | 0.56 (0.11) 0.65 (0.29) 0.80 (0.56) 0.89 (0.70) 0.94 (0.80) 
n 20 | 0.53 (0.05) 0.63 (0.24) 0.78 (0.53) 0.86 (0.68) 0.91 (0.79) 
30 | 0.52 (0.03) 0.62 (0.23) 0.77 (0.52) 0.85 (0.68) 0.91 (0.79) 
50 | 0.51 (0.02) 0.61 (0.22) 0.76 (0.51) 0.84 (0.67) 0.90 (0.78) 
100 | 0.50 (0.01) 0.60 (0.21) 0.76 (0.51) 0.84 (0.67) 0.89 (0.78) 








The relative efficiency of n? compared with s* is given by the expres- 
sion 


—1)p. —(n—-3 
Relative efficiency of 7? to s? = _ he — (4.3) 


(n — 1)(B2 + 1) 

and the figures for this are given in parentheses in Table 2. 
In the particular case of 6, =1 which corresponds to a frequency dis- 
tribution with just two equal ordinates it is of interest to note that 
s? becomes a “super efficient” estimate since the variance reduces to 





n(n — 1) ' 


and thus decreases in the order 1/n? instead of the usual 1/n. 
Generally the relative efficiency of 6 increases with 62 and similarly 
for 7. The relative efficiency of 7? also catches up that of 6 as Be in- 


H2(s*) = : 
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i creases. Hence the use of 6 will be of greatest value when the parent 
population is of leptokurtic form. We notice that the skewness of the 
parent population does not affect our results in any way and only the 


kurtosis matters. We also see that the size of the sample has remarkably 
little effect on the relative efficiencies, except for very low values of 2, 
and the efficiencies are dependent almost entirely on the 2 of the parent 
population. 


5. BIAS OF ESTIMATORS 
To examine the question of bias we use a method similar to that 
utilized by Kamat [8] in the case of the statistic d. Let the mean of the 
sampled population be 6; when the ith observation is taken and the 
standard deviation be o throughout. Let A@;/o = (0:41—0;)/o be small 
so that the third and higher powers may be neglected. Then by follow- 
ing a similar method to that used in Section 3, taking, for example 


(n — 1)8 = b { (zits — Bins) — (ti — 8:) + (8:41 — 0:)}? 


t=] 


we obtain the following results, where 








0 = = > 6:, 
N inl 
n—1 Ad; ' 
(252) = 3 + a on 
mee 8 = Daw)? 
sey ae 4 )2 
(88) = fon aa tit—F (ao? 6.2) 


us 
— (A0,)(A@43) | + — (A012 — a0) , 
Me 


The last term will be zero for a linear trend with observations taken at 
equal intervals and can usually be safely ignored. For 7? we have 


1 m 
br’ (n?) = Me + oy >» (A62;-1)?, (5.3) 


i=] 


1 





2 m 
who = {s. t+? Ady.* (5.4) 
2m 


Me im 
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and for s?, we have 





} (0; — 8)? 
pa’(s?) = we | 1 + in 7 (5.5) 
2 n—3 +t = P . 
bo(s*) = be + - ——o + m— Dim Din > (0; — 8) \ . (5.6) 


Equations (5.1) and (5.5) are given by Kamat [8] and in the cases where 
the drawings come from a normal population (5.2) and (5.6) reduce to 
the values that Kamat gives. 

We will take two specific cases as illustrations of these general for- 
mulas. 


Example 1. 
6; = wi’ + (0-05)ic *=1,2,---,n=% 


The percentage bias in the estimate 6?=0.125 
The percentage bias in the estimate n? =0.125 
The percentage bias in the estimate s?=8.750. 


Example 2. 
6; = wy’ + (0-02)ic fort = 1,2,---,m = 10 
= mw’ + 0-2 — 0-03 (¢ — 10)¢ fort = 11, 12,---,n = 20. 
This is a case where the mean increases slowly linearly and then de- 
creases linearly at a different rate. 


The percentage bias in the estimate 6?=0.033 
The percentage bias in the estimate n? =0.032 
The percentage bias in the estimate s?=0.728 


We notice how close the bias in 6 and 7? come. In fact for a trend that . 


is exactly linear all the time they give the same percentage bias and as 
5? has the lower variance it is clearly better to use it. s? has a very much 
larger bias. In example 2 the bias is 23 times as large as that of & and 70 
times as large in example 1. This disparity is mainly due to a certain 
amount of averaging out in s? in example 2 due to the trend increasing 
and then decreasing. 

It is also of interest to see the effect that these slow shifts have on the 
variance of the estimators. First we make a few general remarks. The 
variances do not depend in any way, to the order of approximation con- 
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' sidered, on the skewness of the populations sampled but do depend on 
| the amount of kurtosis present. If we imagine taking a sample of size n 
_ with a fixed set of values 6;, then for a whole series of populations with 
the same variance the populations with the Jower values of f: will pro- 
- duce the lower sample variances of the estimators of variance. If we 
' consider example 1 again and imagine sampling from the three popu- 
lations 
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a) Rectangular B2=1.8 
b) Normal B2=3 
c) x° type B2=6 


all three populations having the same variances then the percentage 
change in the variance of the estimators over the variances that would 


- occur with no trend present are as given in Table 3. 











TABLE 3 
PERCENTAGE CHANGES IN VARIANCES OF ESTIMATORS 
Estimator Rectangular Normal x? type 
& 0 0 0 
n? 0.045 0.031 0.018 
3? 40 .698 17 .500 7.216 











From this table we see that 6 and 7? give a negligible change, in fact 
for a linear change & gives zero, whilst s? gives a huge difference which 
diminishes as f2 increases. We must emphasize that this variance is 
that of the three statistics in repeated samples drawn from the popula- 
tions specified where the linear trend of example 1 holds. 


6. PARENT POPULATION NORMAL 


We turn now to a consideration in this and the succeeding three sec- 
tions of the form of the distribution of 6 in samples from various types 
of population and how to obtain quick and approximate significance 


| points. First we consider the parent population to be normal. In this 


case from (3.4) we have 
16(5n — 8)? 3(9n* + 46n — 112) 


; > ’ 2 


(38n — 4) (38n — 4)? 








and in the first three columns of Table 4 the values of 6; and #2 are 
given for a series of values of n. We notice that the values tend to the 
normal curve values as n increases. Plotting them in a chart of (8:, Be) 
field as in Figure 1 we find that they fall in the Type VI region between 
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TABLE 4 
VALUES OF gs, AND 6; IN SAMPLES FROM NORMAL POPULATION | 
1st approximation 2nd approximation 
n A Bs 
v Bs Bs ’ A: Ba 

10 1.6058 5.5385 6.2308 1.2839 4.9258 4.9819 1.6058 5.4087 
20 0.7711 4.2168 12.8928 0.6205 3.9307 10.3743 0.7711 4.1566 
30 0.5072 3.7999 19.5581 0.4090 3.6135 15.7721 0.5072 3.7608 
40 0.3779 3.5957 26.2241 0.3051 3.4077 21.1710 0.3779 3.5668 
50 0.3011 3.4746 32.8900 0.2432 3.3648 26.5704 0.3011 3.4516 
75 0.2007 3.3146 49.5566 0.1614 3.2421 40.0696 0.1997 3.2995 
100 | 0.1493 3.2353 66.2230 0.1208 0.1812 53.5692 0.1493 3.2239 











the bounding lines of the Type III and Type V curves and rather nearer 
to the former than the latter. If the (@;, 82) points for the log normal 
distribution are also put on the chart, it is found that our values are 
above them as well. To obtain approximate significance points we may 
fit a frequency curve whose moments agree with those of the statistics 
concerned. Three methods suggest themselves, namely touse a Type III 
or x? distribution, a Type VI or a log normal distribution. Since the 
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object is to be able to obtain a quick method of getting significance 
levels, the first method which leads up to an extensively tabulated 
function would appear to be very much more attractive than the other 
two, both of which require a three moment fit and, for the Type VI, 
tables of the incomplete beta-function. 

To use a Type III approximation we take 6/o? to be distributed in 
the same way as c:x?/v where the x? has v degrees of freedom. If we now 
make the first two moments agree we find 


c= 2, v= 4(n — 1)*/[(2n — 3)6, +1]. (6.1) 


Using these values we give in Table 4 under the heading 1st approxima- 
tion the resulting values for » and hence for 6; and #2 since for a x? 
distribution 

A, = 8/2, B, = 3 + 12/v. (6.2) 
We see that for all the values of n the fitted approximate ((;, 82) point 
is nearer the normal point, as regards both #, and 2, than before. The 
effect of using the approximate fitted x? instead of a complete Type VI 
solution is illustrated for the case of n=50 in Table 5 under the Ist 
approximation where the Type VI values are taken from [11]. The 


TABLE 5 
PROBABILITY (é/o?)<& FOR n=50 











rg Type VI lst approx. 2nd approx. Normal 
0.30 0.00000 0.00001 0.00000 0.00118 
0.75 0.00031 0.00049 0.00023 0.00563 
1.00 0.00674 0.00797 0.00619 0.02129 
1.25 0.04393 0.04825 0.04407 0.06418 

















normal column is obtained by assuming & to be normally distributed 
with the mean and variance given by (3.4). Our approximation slightly 
over-estimates the probabilities, as measured by the Type VI which is 
most likely to be nearest the true value, but not nearly so much as the 
normal approximation and it would certainly be accurate enough for 
most purposes. The approximation can be improved by using a fitting 
of the form ¢x?2/v-+c: for &/o?. To fix the constants we must now make 
three moments agree giving 


(3n — 4)? - n(n — 2) 
a (a — DGn— 8) 








C1 


~ (— 1a —8)- 
_ (3n — 4) 
~ 2(5n — 8) 


(6.3) 
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The new values of vy and the corresponding values for 6; and #2 are given 
under the 2nd approximation in Table 4. In this case the f;’s agree due 
to the three moment fit and the values of @, are slightly lower than the 
true values. In Table 5 under the heading “2nd approximation” the 
values of the probabilities obtained using this second method are given. 
This second approximation makes the start at a distance cz above zero 
instead of zero which must be the true start although for all practical 
purposes the start will be above zero. We also note that in this particu- 
lar case 7? will be distributed as x? with m degrees of freedom since each 
pair of values gives rise to the square of a normal variate which is 
distributed as x? with one degree of freedom. 

Finally in Table 6 we give some approximate upper 5 per cent points 
for 6?/o? calculated by four different methods. The x? approximations 
and the normal approximation are those we have just described and the 
Pearson Type VI curve uses the value of the upper 5 per cent points of 
the Pearson curves given by Pearson and Merrington [12]. The case of 
n equal to 10 fell outside the latter table and was computed directly 
from a fitted Type VI curve. From these results it seems apparent that 
for any reasonable size n the first x? approximation will provide an ade- 
quate significance point even if the normal distribution is not adequate. 


TABLE 6 
APPROXIMATE UPPER 5% POINTS FOR 6 /o? 











Value of n 10 20 30 40 50 75 100 
x? lst approx.| 4.15 3.45 3.15 2.99 2.88 2.70 2.61 
x? 2nd approx.| 4.17 3.46 3.16 2.99 2.88 2.71 2.61 
Normal 4.22 3.54 3.25 3.08 2.96 2.78 2.68 
Pearson curve| 4.17 3.46 3.16 2.99 2.88 2.71 2.61 
Log Approx. 4.33 3.54 3.22 3.03 2.91 2.73 2.62 





(section 11) 





7. PLATYKURTIC PARENT POPULATION 
If the parent population be taken as rectangular with 
p(x) = 1, OS281 
= 0, elsewhere (7.1) 


then all the odd moments are zero and the even moments have the 
values 
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The value of 8; is zero and 2 is 1.8 giving a platykurtic distribution. In 
practice we would not estimate variance by the sample variance in this 
case but it is used here as an illustration of a platykurtic distribution. 
Table 7 gives the values of 6, and f, of 6? for the same sample sizes as 
before. The values now are nearer the normal point than they were for 
samples of the same size from a normal population and we find that 











TABLE 7 
MOMENTAL RATIOS FOR SAMPLES FROM RECTANGULAR 
POPULATION 
1st approximation 2nd approximation 
n Bi Bs 
y Bi Bs » Bi Bs 





10 | 0.5546 3.7282 10.2532 0.7802 4.1703 14.4248 0.5546 3.8219 
20 | 0.2673 3.3137 21.3609 0.3745 3.5617 29.9065 0.2673 3.4011 
30 | 0.1760 3.1994 32.4710 0.2464 3.3696 45.4545 0.1760 3.2640 
40 | 0.1318 3.1460 43.5692 0.1836 3.2754 60.6980 0.1318 3.1977 
50 | 0.1048 3.1152 54.6925 0.1463 3.2195 76.3358 0.1048 3.1572 
75 | 0.0694 3.0754 82.4699 0.0970 3.1455 115.2738 0.0694 3.1041 
100 | 0.0520 3.0560 110.2474 0.0726 3.1089 153.8640 0.0520 3.0780 














they actually fall in the Type I region of the (61, 62) plane. The Type I 
curve is very tedious to fit having four constants to be fixed and a x? 
may once again be a suitable approximation in that it has the range 
starting at zero and is near the actual momental ratios. Using the same 
procedure as before we find 


¢ = 2, vy = (n — 1)?/(0.9n — 1.1), (7.2) 
and the fitted points are given in the table. A gain in accuracy may once 
more be obtained by making the start slightly positive, the values being 
given under the second approximation. From this it seems that x? can 
be used as a reasonable representation of the distribution. 

8. LEPTOKURTIC PARENT POPULATION 


As an example of this kind of parent population we consider the first 
law of Laplace or the double exponential distribution where 


p(x) = fell, —-2ox<2r< w, (8.1) 


It is easy to show for this distribution that y2,=(2r)!, whilst the odd 
moments are all zero. Hence we have 


Me = 2, ba = 24, us = 720, us = 40320, 


giving 62=6 which indicates a leptokurtic distribution. Substituting 
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TABLE 8 
MOMENTAL RATIO IN SAMPLES FROM DOUBLE EXPONENTIAL 
POPULATION 
lst approximation 
n Bi Be 
v Bi Be 

10 5.4840 13.3579 3.1456 2.5432 6.8148 
20 2.6193 7.9196 6.4753 1.2355 4.8533 
30 1.7200 6.2249 9.8076 0.8157 4.2235 
40 1.2803 5.3985 13.1404 0.6089 3.9134 
50 1.0196 4.9092 16.4734 0.4856 3.7284 
75 0.6757 4.2644 24.8063 0.3225 3.4838 
100 0.5052 3.9451 33.1395 0.2414 3.3621 








these moments in (3.4) we get the 6;, 8: values for 5 and they are given 
in Table 8. Once again they fall in the Pearson Type VI region, this 
time very close to the log-normal line. 

The 8, and £ points for the first kind of x? approximation are given. 
For the second kind of approximation the values of 8; would agree but 
the relative accuracy of 62 would not be as good as it was in the case of 
sampling from a normal population because these points all fall below 
the locus of points for 5? in samples from a normal curve. 

It thus seems that as the kurtosis of the parent population rises the 
resultant locus of 61, 82 points for 6? pivots approximately on the normal 
point and swings from the Type I region through the Type III line into 
the Type VI region. 


9. SKEW PARENT POPULATION 


So far we have considered parent populations with varying amounts 
of kurtosis but they have all had zero skewness. We now take a case 
where there is both skewness and kurtosis present. 

The particular distribution that has been used is that of a x? variate 
with v degrees of freedom. For this distribution we have 


bi’ = », Me = 2», us = 8», us = 12r(v + 4), 
32y(5v + 12), us = 40r(3v? + 52y + 96), 
167(105r* + 4760»? + 29,232» + 40,320). 


As illustrations we take two cases (i) v=4 and (ii) v=12. For the former 
6; =2, B2=6 whilst for the latter 8, =2/3, 8. =4. In Table 9 the resulting 
values for 8; and Be are given. 
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It will be seen that they hug the Type III line fairly closely for large 
n and the Type V for low n being all the while in the Type VI region 
which accords with the statement made earlier. In this case the ap- 
propriate x? approximation would, for large n, give us quite a reasonable 
figure although for small n it would be better to fit a Type V or inverted 
x? with appropriate moments. It is of interest to note how the difference 
between the upper 5 per cent points for a Type III and Type V curve 
with the same f, vary. For a 8, of 0.05 the difference is 0.02, for 6, equal 
to 1 it is 0.04. These figures are in terms of the standard deviation of the 
population as unit. The means of our distributions are always 2 so 
that unless the standard deviation is large the error is likely to be small 
in taking a Type III approximation with correct 6; when the true dis- 
tribution actually lies between the Type III and Type V boundaries. 
It must also be remembered that in some cases the distribution lies 
above the Type III line so that by using that as the approximation we 
are playing for safety and should be nearer the true figure in a large 
number of cases. 














TABLE 9 
MOMENTAL RATIOS FOR SAMPLES FROM TYPE III POPULATION 
v= 4 p= 12 
n 
Bi Be Bi Be 

10 6.0942 14.8563 3.5258 8.6772 
20 2.9334 8.6259 1.6815 5.7279 
30 1.9255 6.7097 1.1037 4.7943 
40 1.4330 5.7605 0.8214 4.3367 
50 1.1411 5.1981 0.6541 4.0650 
75 0.7561 4.4563 0.4334 3.7062 

100 0.5653 4.0888 0.3241 3.5282 











10, DISTRIBUTION OF 8? 


In order to make some comparisons we now make a few general 
remarks about the form of the distribution of s? whose moments were 
given in (3.6). For the case of samples of size n from a normal popula- 
tion it is known that the sample variance is distributed exactly as 
x’o?/(n—1) where x? has (n—1) degrees of freedom. For the cases of 
sampling from a Type III population with varying amounts of skew- 
ness and kurtosis the momental ratios are given in Table 10. On com- 
paring the sets of momental ratios with those of & for the correspond- 
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TABLE 10 
MOMENTAL RATIOS 























Normal pop. Type III population 
" f.=0 B2=3 Bi =2 B2=6 f=} B2=4 
Bi Be Bi Be Bi B: 
10 0.89 4.33 5.41 15.52 2.55 7.49 
20 0.42 3.63 2.70 9.45 1.26 5.31 
30 0.28 3.41 1.80 7.32 0.84 4.54 
40 0.21 3.31 1.34 6.23 0.62 4.16 
50 0.16 3.25 1.08 5.58 0.51 3.95 
75 0.11 3.16 0.71 4.69 0.32 3.57 
100 0.08 3.12 0.53 4.19 0.25 3.41 





ing populations we notice a marked similarity of the way they progress 
and of the order of magnitude especially for the case of a Type III 
parent population. 


11. TRANSFORMATION OF 8 


A fruitful idea would be to find some form of transformation for & so 
that it will make the test more robust in the sense that the significance 
points depend very little on the actual form of the distribution of the 
original observations. One obvious possibility is to use the logarithm 
of the quantity 6. It has not proved possible so far to obtain the dis- 
tribution of log 5 but we know the moments of log s* and can appeal 
to the similarity of 5? and s?. These moments may be obtained directly 
from David [4] in terms of the population cumulants. In some instances 
it is easier to apply than in this way whereas in other cases it is easier 
to work in terms of the population moments. For the case of samples 
from a normal distribution Bartlett and Kendall [1] have given the 
exact values in terms of the derivatives of the logarithm of the gamma 
function, tabulated the moments up to n equal to 20 and given working 
approximations for 1 over 20. The values given in Table 11 are taken 
from [1] for n equal to 10 and 20 whilst for n over 20 they are obtained 
from a more accurate approximation than that given in [1]. For the 
case of sampling from a non-normal population we give in Table 11 
the values of the momental ratios for the case of samples from the two 
Type III populations considered earlier. The values for n equal to 10 
are not given as the approximations proved unreliable for n as low as 
10. In all cases there is a considerable movement, for any given n, 
towards normality and it would therefore seem feasible to use the log- 
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TABLE 11 
MOMENTAL RATIOS OF Log s? 











Normal pop. | Type III population 
¥ Bi =4 B2=6 Bi=4 B:=4 
By Be Bi Be Bi Bs 





10 0.2200 3.4370 — _— _— _— 

20 0.1050 3.2100 0.2728 3.4992 0.0572 3.2280 
30 0.0710 3.1287 0.0770 3.3944 0.0201 3.1528 
40 0.0525 3.0974 0.0242 3.3058 0.0098 3.1133 
50 0.0416 3.0784 0.0084 3.2453 0.0058 3.0889 
75 0.0274 3.0689 0.0005 3.1617 0.0022 3.0566 
100 0.0204 3.0396 0.0000 3.1176 0.0011 3.0444 














arithm for a quick test of significance using normal curve factors. 
Since there is such a marked similarity between s? and 6, it would 
seem reasonable that the logarithmic transformation would be of use 
in the case of the latter as well. 

To carry out a test of significance of 6? in this way we would take 








L = log, 8, 
, - (2n — 3)B2+ 1 
wi (L) = loge 2u2 tn — 1? 
o(L) = V/2(2n — 3)B2 + 2/2(n — 1), (11.1) 


where 62 and yu refer to the sampled population. Then the ratio 
(L — m’'(L))/o(L) 


should be referred to the normal probability scale. The first moment is 
accurate to order 1/n only and the second moment is obtained by using 
the relation that if y=f(x) then 


Cy cs 


Oy 


dy 
dz 








’ 
z=t 


t being the mean value of z. 

In Table 6 the last row gives the upper 5 per cent points for s? using 
(11.1) and it will be seen that they approach the Pearson curve values 
more quickly than the normal approximation. If more terms were taken 
in the expressions given in (11.1), the approach would be very much 
quicker but the test then loses its simplicity and thus its merit. 








452 AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1955 


12, ILLUSTRATIONS 


In this section we will consider four examples that illustrate some of 
the points that have been made in the preceding sections although these 
by no means exhaust the possibilities. 

Example (t). Counts are being made of the intervals between radio- 
active particle emissions and it is desired to estimate the variability. 
The first forty intervals observed are: 


5.2, 6.5, 6.7, 2.3, 1.4, 0.1, 4.6, 3.2, 3.8, 0.6, 1.2, 17.8, 0.9, 0.3, 1.2, 2.55 
0.9, 0.1, 5.2, 7.0, 1.5, 2.4, 0.1, 4.5, 0.7, 3.0, 9.3, 1.7, 4.8, 3.9, 5.1, 0.5 
0.8, 1.0, 0.2, 3.9, 2.6, 4.7. 4.1, 4.0. 


Using the successive difference technique we get 11.28 as our estimate 
of the population variance instead of 10.43 using s?. Any hesitation we 
might have in using the successive difference method of estimation 
would be diminished when we realize that these counts come from an 
extremely leptokurtic population, one in which the {2 is approaching 
a figure of about 9. Hence this method of estimation is 90 per cent effi- 
cient when compared with the usual variance estimate and added to 
this it is very much quicker in calculation time than the usual variance 
estimate. There is the added advantage that further observations may 
be immediately incorporated as they come along without any need for 
fresh calculations of means and so forth. 

Example (ti). In Table 12 below is given a sample of 40 individuals 
drawn from a population believed to be normal and an estimate of the 
variance of that population is required. The values, in order of occur- 
rence, go down the columns. 

From these figures we obtain straightforwardly 








Mean = 11.378 

Variance (s?)= 4.586 

15? = 3.564 

TABLE 12 
RANDOM SAMPLE FROM NORMAL POPULATION WITH TREND 

10.52 14.69 7.10 9.73 13.66 
7.80 11.52 9.00 12.25 13.85 
6.86 12.68 14.55 8.65 16.20 
9.82 13.45 13.34 13.84 11.80 
10.04 14.57 9.56 11.23 11.37 
9.83 9.94 9.82 12.38 13.79 
8.71 13.66 11.57 13.42 9.93 
10.85 10.73 13 .64 11.57 11.19 
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and we notice that the latter gives a very much smaller estimate than 
the usual variance estimate. In actual fact the above series was ob- 
tained artificially as follows. Using the tables of random normal devi- 
ates due to Wold [17] a sample of size 40 was drawn with mean 10 and 
variance 3.133. To this series was added a trend which increased the 
rth value by an amount 0.05r. Thus over the whole series of 40 observa- 
tions there was a trend of 1.13 times the standard deviation or 0.03 
of a standard deviation between each successive observation. From the 
expressions given in Section 5 we find 


wi’ ($6?) = 3.351, e( $5?) = 0.856, 
wy(s?) = 3.641, Me(s*) = 0.676. 


Further whilst there is no increase in p2(46*) as against the case where 
there is no trend present, there is a percentage increase of 17 per cent in 
yo(s?) under the same conditions. For y;’, 6? has a percentage increase of 
only 0.04 per cent as against 8.69 per cent in the case of s*. The estimate 
using 6? in this case is still high but it is well within the sampling 
fluctuations we can expect to find in such a sample. 

Example (iii). In Table 13 we give the resistance in megohms of 
50 pieces of electrical insulating material. All the figures have been 
divided by 5 and given an arbitrary origin of 900 for the purposes of 
clarifying the discussion. The data is taken from Shewhart [13] and the 
order of occurrence of the items is down the columns. 

Let us suppose that some form of quality control chart was being 
put into operation based on these figures. We find 


Mean = 1.70 
Variance (s?) = 9,975.00 
and hence 
Standard deviation = 99.87. 
TABLE 13 
RESISTANCES OF MATERIALS (ARBITRARY UNITS AND ORIGIN) 











109 —171 27 28 58 
- & — 148 44 79 — 32 
— 3¢ — 240 62 58 79 
— 105 — 163 13 69 250 
— 42 — 207 —18 40 48 
- 140 —87 20 100 
- § 120 13 —78 79 
— 43 27 138 —18 — 49 
— 104 120 45 — 64 — 66 
—115 190 28 58 — 130 
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Plotting the figures on a graph, however, shows a marked pattern and 
there appears to be a change in mean value over the period. If we now 
compute an estimate of the population variance using 6? we obtain the 
estimate 4256.3 giving a standard deviation of 65.24. This difference 
would have a profound effect on any control limits that we might put 
on the chart. For instance suppose that we decided to use the means 
of four items as our criterion then using the 99 per cent probability 
factors from the normal tables our limits come out to be 


Using s? Upper 130.3 Lower —126.9 
Using 6? Upper 85.7 Lower — 82.3. 


Further the means of the first twelve groups of four from Table 13 are 


—14, —25.5, —134.5, —117.5, 114.25, 36.5, 11.5, 45, 46.75, —25.5, 
88.75, 44.5. 


If we use the control limits obtained from the sample variance, we find 
one mean only, namely —134.5, outside the limits. The other limits 
however give four means outside, namely — 134.5, —117.5, 114.25 and 
88.75. Thus a very different picture arises and a decision must be made 
as to which should be regarded as the appropriate model. In practice 
we would like to have a very long series of observations known to be 
in statistical “control” before computing a variance for purposes 0 
control limits. Until this stage is reached, however, it seems that we 
can arrive at very different conclusions purely by considering our 
material from different viewpoints and in general it would be more 
usual to adopt the cautious view, either by using some estimator based 
on the dispersion in small sub groups or else by using a successive dif- 
ference method. 

Example (iv). From a table of random numbers the following se- 
quence of 100 two digit numbers was obtained tie order being across 
the rows. 


07, 94, 31, 40, 57, 79, 06, 72, 99, 23, 78, 61, 12, 39, 63, 76, 74, 18, 69, 29, 
82, 08, 30, 36, 82, 38, 54, 81, 62, 15, 59, 92, 27, 29. 28 97, 30, 35, 21, 39, 
09, 91, 37, 55, 60, 19, 04, 67, 47, 74. 


These numbers form part of a table which has already been tested for 
the frequency of occurrence of the various digits and we are interested 
here in testing whether the order of occurrence is random or not. We 
find 5/0? to be equal to 2.3155, o? being calculated theoretically for this 
particular case of a retangular distribution. The significance level 
corresponding to this value for 6?/c? may be obtained in one of three 
ways. First we may use the x? approximation given in Section 6 assum- 
ing the underlying population to be normal. Secondly we may use the 
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modification given in (7.2) which assumes an underlying rectangular 
population and finally we may modify (7.2) with a three moment fit. 
These three methods give rise to the following levels of significance 


Normal basis 0.250 
Rectangular basis (Eqn. 7.2) 0.201 
Rectangular basis (3 moment fit) 0.202 


and thus although not significant by any method the use of the correct 
form of distribution brings about a considerable change in the signifi- 
cance level. This difference can of course prove very important. For 
example a page was picked out of a telephone directory at random and 
two digit numbers formed by taking the middle two of the four digits 
in each of the numbers. Fifty such numbers were formed and 6/0? 
came to be 1.2847 where o? was calculated, as before, on the basis of a 
rectangular distribution. 

The significance levels corresponding to this result are 0.061 using 
the normal curve basis and 0.021 using the rectangular basis and a very 
different conclusion would no doubt be drawn from the latter figure 
that would be completely overlooked by the former. 


13. CONCLUSIONS 


The mean square successive difference, 6?, can be used as a measure of 
dispersion in situations where it is desired to eliminate any effects that 
may cause a gradual drift in the mean value of the sampled population. 
Calculations show that although & will never be as efficient an estimator 
as the sample variance, s?, in situations where no drift occurs, never- 
theless as the kurtosis of the sampled population increases the relative 
efficiency of 6 increases very rapidly. When some trend does exist, the 
bias in is negligible compared with that of s? and may hardly change at 
all, ; 

The actual distribution of 8 is also of interest in that it may be used 
where we desire to make a test for homogeneity. Here we find that the 
form of the parent population does change the form of the distribution 
of # to quite a large extent but that it is always somewhere in the 
neighbourhood of a Pearson Type III curve. A suitable approximation 
to the four moment type of solution can be made for most practical 
purposes by using a x? distribution with the first two moments in 
agreement. 

We then illustrate how the distribution of s?, judged by its momental 
ratios, has a form very similar to that of 6. A logarithmic transforma- 
tion is of use for the former in that it brings it to approximate normality 
for lower values of sample size and this suggests that a similar trans- 
formation would be of use with 6 to give a quick significance test. 
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Finally the methods described are illustrated on a number of exam- 
ples. These bring out the differences that exist between the properties 
of estimating variance by successive differences from normal and non- 
normal populations and also how the significance points for & vary 
according to the basic underlying distribution. 
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SAMPLING PLANS FOR INSPECTION BY VARIABLES* 


GERALD J. LIEBERMAN AND GEORGE J. RESNIKOFF 
Stanford University 


I, INTRODUCTION 


N LOT-BY-LOT acceptance sampling by attributes each item of a 

sample drawn from a lot of manufactured items is classified simply 
as defective or non-defective. A random sample is drawn from the lot 
and the lot is either accepted or rejected depending solely upon the 
number of defectives in the sample. Inspection procedures by variables 
are based on the measurement of a variable quality characteristic, and 
the decision to accept or reject the lot is a function of these measure- 
ments (as opposed to the number of defectives). Variables inspection is 
applicable whenever the testing of individual items involves measure- 
ment and the form of the distribution is known. Since inspection by 
variables, when it is applicable, makes greater use of the information 
concerning the lot than does inspection by attributes, variables plans 
require smaller sample sizes than attributes plans furnishing the same 
protection. Variables sampling plans pertain to a single quality charac- 
teristic, and it is usually assumed that measurements of this quality 
characteristic are independent, identically distributed normal random 
variables. Such an assumption will be made throughout the paper. 

For the purposes of this paper, sampling inspection by variables is 
divided into three categories, known standard deviation plans, un- 
known standard deviation plans, and average range plans. Known 
standard deviation plans are based upon the sample mean and the 
known standard deviation. Unknown standard deviation plans are 
based upon the sample mean and the sample standard deviation. 
Average range plans are based upon the sample mean and the average 
range in subsamples. 

This paper presents a matching collection of variables sampling 
plans based on known standard deviation, unknown standard devi- 
ation, and average range. Each operating characteristic curve shown 
tepresents a variables sampling plan of any one of the three types. In 
other words, if the user chooses an OC curve he has at his disposal the 


choice of the three types of plans, all guaranteeing essentially the 
same protection. 








__ * This work was performed under the sponsorship of the office of Naval Research. The authors are 
indebted to Albert H. Bowker for invaluable aid. 
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For the three types of plans to have approximately the same OC 
curve it is necessary in some cases to use a different sample size for each 
type. Therefore, it is not possible to index the plans by sample size, 
Instead the plans are indexed by code letter and Acceptable Quality 
Level (AQL). The AQL may be defined as that quality level considered 
acceptable as a process average. Each combination of code letter and 
AQL refers to a single OC curve which is used as the OC curve for any 
one of the three types of plan. The probability of acceptance at the 
AQL varies from .89 for the code letter representing the smallest 
sample sizes to .99 for the code letter representing the largest sample 
sizes. This variation in the probability of acceptance at the AQL fol- 
lows the practice of Military Standard 105 A! [9]. 

Like attribute plans these variables plans involve measurement of 
lot quality by the per cent defective. Previous tables of variables 
plans involve measurement of lot quality by the average and vari- 
ability of the measurements. With the plans given here it is not neces- 
sary to shift to attributes sampling whenever per cent defective is the 
appropriate measure. 

All variables sampling procedures for two-sided specification limits 
suffer from the fact that the probability of accepting a submitted lot 
with given per cent defective p does not depend on p alone but on the 
division of p into two components, the per cent lying above the upper 
specification limit and the per cent lying below the lower specification 
limit. For this reason a two-sided procedure does not have a unique OC 
curve but rather a band of curves, each curve within the band repre- 
senting a possible division of p. With the two-sided procedure? given 
in this paper this band is so narrow as to be for all practical purposes a 
single curve. Since the OC curve for the one-sided test is contained 
within this narrow band, it is used as the OC curve for the two-sided 
procedure. 

II. GENERAL INSPECTION CRITERIA 


Associated with each inspection characteristic are the design specifi- 
cations. If only an upper specification limit U is given, the item is con- 
sidered defective if its measurement exceeds U. If only a lower specifi- 
cation L is given, the item is considered defective if its measurement 
is smaller than L. If both upper and lower limits are specified, the item 
is considered defective if its measurement either exceeds U or is smaller 
than L. 


1 Military Standard 105 A, usually referred to as MIL-STD 105 A, is a U. 8S. Department of De- 
fense document containing acceptance sampling plans by attributes. 

2 The graphical two-sided procedure for unknown standard deviation plans given by Bowker and 
Goode [2] is equivalent to the two-sided procedure for unknown standard deviation plans pr esented in 


this paper. 
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If the per cent defective of a submitted lot is known, no sampling 
inspection is necessary to determine whether or not the lot is to be 
accepted. If the per cent defective is sufficiently small, the lot is ac- 
cepted, otherwise it is rejected. Since such knowledge about the per 
cent defective is rare, a logical procedure is to estimate the per cent 
defective from a sample, and accept or reject the lot on the basis of this 
estimate. A sampling plan is then described as consisting of the sample 
size n; a method of estimating the per cent defective, and a “maximum 
allowable estimated per cent defective,” p*. If only an upper specifica- 
tion limit, U, is given the estimate of the percentage above this limit, 
$v, is obtained from the sample of size n. If Jy <p*, the lot is accepted. 
If only a lower specification limit, L, is given, the estimate of the per- 
centage below this limit, #z, is obtained from the sample of size n. 
If $.<p*, the lot is accepted. If a double specification limit is given, 
both Jy and fz are computed. If =fv+ 1S p*, the lot is accepted. 


III. KNOWN STANDARD DEVIATION PLANS 


In this section, it wil! be assumed that the standard deviation o of 
the measurements is known. A complete set of sampling plans indexed 
according to AQL and code letter* is given in Table 1. Operating char- 
acteristic curves for these plans are the graphs of Figures 1-16. 

Before presenting the general inspection procedure for known stand- 
ard deviation plans, some discussion about two-sided specifications is 
in order. For this case information is available about the per cent de- 
fective in the lot before a sample is drawn. There is a sufficiently small 
value of (U—L)/o, which is a function only of the AQL, that assures 
incoming quality worse than the AQL. Consequently, before sampling 
(U-L)/o should be calculated and the lot rejected (with no sample 
taken) if this value is smaller than the minimum value of (U—L)/a, 
on the grounds that the incoming quality is poorer than the AQL. 
Minimum values of (U—L)/o can be found in Table 1. Intuitively, 
this implies that the combination of narrow specification limits and 
large standard deviation mitigate against the submittal of good quality. 
The inspection procedure is as follows: 

1, Draw a random sample of n items and compute 


n 
PoE? 
i=l 


E=z 





n 








* The code letters run from B to Q, omitting A. This was done so that these tables would resemble 
the tables in MIL-STD 395 A. 
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2. a. For an upper specification limit compute 


U—z n 
Cy’ -( )e where v = / . 
oc n—1 


b. For a lower specification limit compute 


Zz-L n 
C1,’ = ( )o where v = /- . 
a n—1 


ce. For a double specification limit, compute both 


U—z #-L 
Cy’ -( )o and C,’ -( Yo 
Cg Co 


if (U—L)/o is greater than the minimum allowable value 
found in Table I. Otherwise, the lot is rejected before a sample 
is drawn. 

3. Enter Table II with Cy’ and/or C,’ and read out fy and/or $, 
whichever is applicable.* 

4. For an upper specification limit accept the lot if fuSp*. Fora 
lower specification limit accept the lot if ,<p*. For a double 
specification limit accept the lot if y+. Sp*. 

Example. The specified minimum yield point for certain steel castings 
is 55,000 pounds per square inch. The standard deviation is known to 
be ¢=3,000 psi. A 1% AQL plan is to be used with a sample size of 6. 
The yield points of the sample specimens are 


62,000; 61,000; 68,500; 59,500; 65,500; 63,900. 




















The following are computed from the data: 
1. €=63,400. 
2b. CL = (63,400-—55,000/3,000) (1.095) =3.07. 
3. From Table II £,=.107%. 
4, From Table I p* is 2.57%. Hence the lot is accepted. 


IV. UNKNOWN STANDARD DEVIATION PLANS 


In this section it will be assumed that the standard deviation of the 
measurement is unknown. A complete set of sampling plans indexed 
according to AQL and code letter is given in Table III. Operating 
characteristics curves for these plans are provided in the graphs of 
Figures 1-16. 





4 It is shown in Section VI-3 that these estimates are the uniformly minimum variance unbiased 
estimates of the true per cent defective. 
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The inspection procedure is as follows: 
1. Draw a random sample of n items and compute 


n n 
Do te d. (2: — &)? 
i==1 i=] 
: ands = . 
n n—1 











~E= 


2. a. For an upper specification limit, compute Cy =(U —)/s. 
b. For a lower specification limit, compute C,=(#—L)/s. 
ec. For a double specification limit, compute both Cy=(U—)/s 
and C',=(z—L)/s. 

3. Enter Table IV with Cy and/or C, and read out fy and/or pz 

whichever is applicable.® 

4. For an upper specification limit accept the lot if jy S<p*. For a 

lower specification limit accept the lot if ,<p*. For a double 
specification limit accept the lot if ju +f. p*. 

Example. The specifications for electrical resistance of a certain elec- 
trical component is U=660 ohms and L=620 ohms. A .4% AQL plan 
is to be used with a sample of size 10. The data are as follows: 

639, 640, 650, 647, 662, 637, 652, 643, 657, 649. 
The following are computed from the data: 

1. £=647.6, s=+/65.38 =8.04. 

2c. C,=647.6 —620/8.04 =344, Cy = 660 —647.6/8.04 = 1.54. 

3. From Table IV 


p, = 9, 
bu = 5.381%, 
b= P+ pu = 5.31%. 
4. From Table III 
p* = 1.30%. 


Since > >p*, the lot is rejected. For possible review board evidence, 
the best estimate of the per cent defective in the lot is 5.31%. It is 
evident that the total estimated per cent defective is due almost en- 
tirely to items exceeding the upper specification limit. 


V. AVERAGE RANGE PLANS 


In this section it will be assumed that the standard deviation of the 
measurement is unknown. A complete set of sampling plans indexed 





_ "It is shown in Section VI-4 that these estimates are the uniformly minimum variance unbiased 
estimates of the true per cent defective. 
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according to AQL and code letter is given in Table V. Operating char- 
acteristic curves for these plans are provided in the graphs of Figures 
1-16. 

The inspection procedure is as follows: 

1. Draw a random sample of n items and group the measurements 
into k subgroups of 5 (where n=5k) with the exception of sample 
sizes of 3, 4, and 7. For these sample sizes use a single subgroup 
of 3, 4, and 7, respectively. For each subgroup compute the range 
(R,). Determine the average range 


R-(dR)/t 


t=] 


t-(Xa)/n. 


t=1 


Calculate 


2. a. For an upper specification limit, compute 


Uz 
Cy” = ( 3 =); 


b. For a lower specification limit, compute 


t—L 
ou = (24) ne 


c. For a double specification limit, compute both 
U —Z zt—L 
Cy" = ( = =); and C," = (- - )as 
R R 


3. Enter Table VI with Cy’’ and/or C,’’ and read out fy and/or 
px whichever is applicable. 

4. For an upper specification limit accept the lot if Jy <p*. For a 
lower specification limit accept the lot if £,<p*. For a double 
specification limit accept the lot if y+), < p*. 

Example. The maximum temperature of operation for a certain de- 

vice is specified as 180°. A 4% AQL plan is to be used with a sample of 
size 15. The sample items have operative temperatures of 














178, 175, 174, 158, 172, R,; = 20, 
177, 166, 172, 167, 163, R, = 14, 
174, 173, 162, 182, 170, Rs = 20. 





* h ia a factor found in Table V which is a function of the sample code letter. 





---F 
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The following are computed from the data: 
1. =170.87. 
%. C/ =(180—- 170.87/18)2.394=1.21. 
3. $,=11.08%. 
4, From Table VI 
p* = 9.61%. 


Since ,>p* the lot is rejected. 


vi-1. GENERAL THEORY’ 


Blackwell [1] has shown that if x has density p§@(x),g is any unbiased 
estimate of 6, and 7’ is a sufficient statistic for 0, then E(g| T) is also 
unbiased, and furthermore, has a variance no greater than that of g. 
Lehmann and Scheffé [8] have extended this result and have shown that 
if T is complete,* then every estimable function® h(@) possesses an un- 
biased estimate with uniformly smallest variance and this estimate is 
the unique unbiased estimate of h(@) which is a function of 7’. Heuristi- 
cally, a statistic 7’ is complete means that if there is a function of T 
which has expected value equal to zero, then this function must be 
identically zero for all values of 7 for which the function has positive 
probability. 

It has already been shown [8] that the sufficient statistics for the 
normal distribution are complete. Furthermore, it is evident that there 
exists an unbiased estimate of the fraction of a normal population 
lying outside a fixed interval, e.g., the fraction of independent observa- 
tions in a sample from this normal population which lie outside this 
interval. Consequently, the fraction of a normal population lying out- 
side a fixed interval (p) is an estimable function and has an unbiased 
estimate with uniformly smallest variance, and this estimate is the 
unique unbiased estimate of p which is a function of the sufficient 
statistics. 

We are concerned with estimating the parameter 





he, 
p=1 -f e7 (1/2) (an) 107). 
L V20e 


Two cases will be considered; namely, (1) » unknown, ¢ known (2) 
both » and ¢ unknown. 





' The idea of using the uniformly minimum variance estimate of the per cent defective as the teat 
statistic for two-sided variables acceptance procedures is due to Albert H. Bowker, Lincoln Moses, and 
Herman Rubin. 

*A statistic T (more properly a family of distributions of 7’) is said to be complete if Eg{/(7)] =0 
for all Ger implies /(7') =0 except possibly on a set N for which pg(N) =0 for all Ger. 


; Given # random variable x with density pg(x), a function h(@) is said to be estimable if there exists 
8 function g(x) such that Eolo(«)) =h(@). 
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Let 21, %2, - - + , , be independent random variables from the above 
normal distribution. Define f’ as the usual attribute estimate of the 
fraction defective i.e., the ratio of the number of defective items to the 
sample size. It is evident that p’ is unbiased. Hence it follows from 
Blackwell [1] and Lehmann and Scheffé’s [8] results that $= E(’| 7) is 
the unique uniformly minimum variance (UMV) unbiased estimate of 
p, where T are the sufficient statistics for the normal distribution. 
Since p’ is the sum of independent identically distributed random 
variables taking on the values 0 and 1, it is evident that p= Z(’| 7) 
is equivalent to E(p| 7) where # is defined as follows: Let y be any one 


of the observations (21, 22, - + +, Xn) Say 21. 
(1) Aly, Te, U3, °° *, Ln) = 0, 
ifLsysU 


Ply, te, 3, °° *y Ln) = 1, 
otherwise. 


VI-2. THE ESTIMATE WHEN THE POPULATION VARIANCE IS KNOWN 


If the observations are drawn from a normal population with un- 
known mean yp, and known variance o?, then 


1 n 

it=— > Li 

N inl 

is a sufficient statistic. In this case b=E(p| 2) is the UMV unbiased 
estimate of p. If # is the estimate (1), then 


o(#) = E(p| @) = Pr{p=1)2} =1-Pr{Lsy<s<U|{3} 
v g(y| 2) 
7  - na! 


where g(y, #) is the joint probability density of y and Z, and h(2) is the 
probability density of Z. 
Consider the joint probability density of y and 





n . 
Oh 


ing N— 1 





J/n—1 


2x0? 


e711 (20?) { Cy m)?+ (m1) (2’—n)"} 
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The transformation 























_  (n— 1k 
£ ll 
n nN 
leads to 
<1 we : - ga nl (207) (4—n)?+(y-2)*/} 
9(Y; 2) QravV/n — 1 
and division by 
2x0 
results in 
g(Y; 1) = VV = a” e77(u-&)"/20°(n—1) | 
h(z) n—1 W220 
Therefore 
6(@) = 1 -f- —— g-n(y)?/20°(n—V dy 
n—1 ms 
Vn] (n—1) (L—2)/o 1 
= ba ——= e- "qt 
/ 24 


1 2 
+ stoma —*. dt, 
Vn/(n—1) U-2z)le V 


VI-3. THE KNOWN-o ACCEPTANCE CRITERION 


The acceptance procedure is formulated as follows: Accept a lot of 
items if in the sample # < p* where p* is so chosen that if the population 
percentage defective is p, i.e., if the portion of the population lying 
outside (U, L) is p, then the probability of acceptance will be L,. It 
is shown that in the one-sided case this is equivalent to the well-known 
and widely used procedure: Accept if 2 U —ke. 


Let K, be defined by 
ro) e-#/2 
f ici 
Ke V/2r 
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In the one-sided case, when L = — ~, then 


oo 
p= f S p* 
~n/ (n—1) (U—€)/o 


implies that 


/n U-—@ n—1 
= K,;° or z s U — K,*0. 
n—-1 o n 
If we take 
n-—1 
k= NV K,*, 
n 


the OC curves of the two acceptance procedures will be identical. 
For the case where there is double specification limit and the stand- 
ard deviation is known, it is evident that if 


U—-L 


g 














< 2K gr, 


the incoming value of p, the per cent defective, is greater than the AQL. 
Consequently, the lot should be rejected without a sample being drawn. 
The OC curve of this two sided test takes the form of a band, the lower 
bound of which is the one sided OC curve. For incoming quality which 
is no worse than the AQL the upper bound corresponds to equal divi- 
sion of the per cent defective. The band here is relatively wide, but 
this is desirable since this gives “good” quality a better chance of ac- 
ceptance. For quality worse than the AQL, the above restriction on 
(U—L)/o eliminates certain divisions, including equal division, so 
that the band quickly tends to the one sided curve. A diagram of the 
OC band is shown at the top of the next page. 


vi-4, THE ESTIMATE OF ~ WHEN THE POPULATION VARIANCE IS UNKNOWN 


If the observations are drawn from a normal population with un- 
known mean yw and unknown variance o?, then the pair of sample 
values #, and )> (x;—#)? are sufficient statistics. In this case 
p(, S?) = E(p| %, S?) where 

n Xs n 


t- 2, St? = >> (4 — @)?. 


t=] t=] 








\y \w ' 





SAMPLING BY VARIABLES 467 


| =YWVJ 


Prob. of 
Acceptance 











0 AQL i 


Per cent defective. 


If f is the estimate (1), then 


7 fy, #, S*) 


de, 8) = Pr{p= ile} —=1- J To dy 





where f(y, Z, S?) is the joint probability density of y, one of the obser- 
vations in the sample, the sample mean #, and the sample sum of 
squares S?, and h(Z, S?) is the joint probability density of and S*. 

It is well known that h(z, S?) is given by 


/n( S2) (n-8)/29—n (4—pn)?/20?—S?/ (20 ) 


h(, S?) = 





n-—-1 


J/2roT (“) (20?) @—-)/2 


To find f(y, 2, S?) consider the joint density of the sample. This may 
be expressed as the joint density of the mutually independent sample 
statistics y, Z’, and S’? where 





Z=> a ' SS’? = > (a; — #’)? 
img N— 1 int 
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(2) 


(\S’?) (n—4) /2 
n—2 
Tr (“) (209 (n—2)/2 
S i 


NE y 


eS’ a 20 





Now, let 





t+ = as ? 
n—1 n—1 
n 
S? = §? — ——(z — y)’, 
nN -_—= 
y = ¥y. 


Under this transformation the expression (2) becomes 





n (n—4)/2 
K 282 — — (# — y)? em (m—1)/20"| (y—n)?/ (n—1) + ((né—y)/ (n=1) 
3 n—1 
(3) —n)?+8?/ (n—1)—n (4—y)?/ (n—1)*; 
where 
7 n 
K = , 
a n—2 
Vn — 1 2r0?(207) PPP = 


and the variables are subject to the restrictions 


—-e<y< ow 
0s 8S? sa 


Em y /n—1 
Ss | n 


Dividing the expression (3) by h(%, S?) results in the conditional 
density of y, given ~ and S?. 


cz 
i i 
n 9 1 = — y 2 n (n—4)/2 
° HG a 
n—1 vat (“—) S S n—1 
TT 
2 
If now the transformation 


PSH int. / n 
2=—-+— 
2 2 S n—1 
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is made, the density of the random variable z is obtained: 








(5) g(2) - ae ginl2—1)—1(] — gz) (n/2-1)—1 
(* —2 (" —2 
r r ) 
2 2 
03281. 


This will be recognized as a symmetrical Beta distribution with 
parameters (n/2)—1. Hereafter, g(z)dz will be denoted by 


w(5-1) 


Since f(z, S?) =Pr{y> U| Z, S’)+Prfy <L| %, S?), then in terms of z 
5(&, St) = P { 1 1U-2 4/ n | 
,S?) = Pr 72 S — - — 
vm 2 2 S as 
1p { 1 l1a-L n \ 
rq2-—-e V 
2 2 S n—1 
max [0,1/2—(U—z)Vn/2Svn—1] n 
“| “G9 
0 2 


max [0,1/2—(@-L)Vn/2S_ n—1] n 
+f dg (< ~ 1). 
0 2 


VI-5. THE UNKNOWN-o ACCEPTANCE CRITERION 


lA 





IV 





Just as in the case where the population variance is known the ac- 
ceptance procedure is: Accept if #(Z, S*?) S p* where p* is so chosen that 
if the population fraction defective is p, the probability of acceptance 
will be Ly. It is shown that in the one-sided case, i.e., where there is 
only one specification limit, say U, this procedure is equivalent to the 
well-known and widely used test +ks < U, where s=S/+/n—1. 
nag B,* be defined by [®-dB(n/2—1)=p*, then #Sp* if and only 
i 0 


1 1 n ids 
2 bk ae ee hee: 


and since s=S/4/n—1 this is equivalent to 





542 ae 
Zz = Bp)s SU. 
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If k is taken equal to (n—1)/./n(1—28,*), the procedures $<p* 
and +ks<U will have identical OC curves. In this paper the accept- 
ance criteria # based on the statistic (U—Z)/s are tabled rather than 
(U —Z)/S, to conform with the common use of the square root of the 
unbiased estimator of o. 

For the case where there is a double specification limit one of the 
authors made numerical investigations [11] of the OC curves for vari- 
ous divisions of the per cent defective. It was evident that the band 
was so narrow that for all practical purposes it can be assumed to be 
a single OC curve, i.e., the OC curve of the one sided plan. 


vi-6. THE ACCEPTANCE PROCEDURE AND ESTIMATE BASED 
ON THE RANGE 2& 


Since the pair of sample statistics (¢, R), where R is the sample range 
or average range,!° is not sufficient for the normal distribution when the 
mean and variance are both unknown, no uniformly minimum variance 
estimate of p which is a function of and R can be derived. In this 
paper the following estimate, #(Z, R), is used: 


- max [0,1/2—a(U— g)/»R] yv+1 
ne, ® = f is|=— - 1] 
0 





2 


max [0,1/2—a(4—-L)/»R} +1 
“f i 
0 2 


where a and » are constants for fixed n, which will be explained in a 
subsequent paragraph of this section. 

No difficulty is encountered in connection with OC curves 
(which were obtained by numerical integration) for the acceptance 
procedure (Z, R) <p* since Pr {p<p*}=Pr {+kR<U} whenever 
k=v(1—28,*)/a. The procedure to accept if +kR<U is of course the 
one commonly used in place of the sample standard deviation in vari- 
ables sampling. 

A heuristic justification for the statistic £(#, R) as an estimate of p 
is the following. It has been verified by numerical investigation that 
the statistic a(U—2)/R is approximately distributed as non-central ¢ 
with degrees of freedom » and eccentricity «/y+1(U —u)/c, whenever 
» and o are such that (U—n)/c is sufficiently large, ie., when the 
population fraction defective is small. The numbers a and » are con- 











10 Whenever the total sample size isa multiple of 5, the subgroup size is taken as 5, 80 that R er 
average range of m subgroups of S (the sample consisting of Sm observations). For sample sizes of 3, 4, 
and 7, R is taken as the range of the sample. 








ge 
he 


ce 





SAMPLING BY VARIABLES 471 


stants for each fixed sample size n and are determined as follows. Under 
the assumption that a(U—2Z)/R is distributed as non-central 
t jyrtiK, £0F small p then 





-_ k 
Pr {fs+—Rs o\ = Pr foutaaaes u\ 


where Z,,; and s, are the sample mean and sample standard deviation 
based on v observations. Equating the first two moments of the statis- 
tics Z+-(k/a)R and #41+(k/V/v+1)s, and solving for a and v gives a 
as a function of v and y as a function of n and k. In fact a=(dev/v)/ce 
where d, is defined by ER=dzo and ¢ is defined by E(s)==c0. In 
order to make the constants independent of k the values for the limiting 
case, i.e., when p goes to zero, were chosen." 

This approximation was verified by the authors by numerical integra- 
tion for the statistic (U—Z)/R and reference to tables of the non- 
central ¢. It was found to be very good for both large and small n for 
most of the AQL values used in this report. It is especially good for 
small n regardless of p. For the larger values of p where n is large the 
approximation did not hold with great accuracy. However, it could 
have been improved by making v, and hence a, functions of p as well 
as of n. This was not done because this would have entailed increasing 
the number of tables of estimates, thereby complicating use of the 
tables by sheer bulk. Furthermore, the effect of this on the estimate 
$(Z, R) would have been slight since on the average the sample values 
of (U—z)/R fall in the region of the tables where the estimate / 
changes very little as n changes. 

Essentially the above argument indicates that f(Z%, R) is approxi- 
mately distributed as £(Z,41, s,). Hence £(Z, R) has sampling properties 
similar to that of #(Z,, s,) where the effective number of observations 
is y-+1, 


VI-7 METHODS USED IN COMPUTING THE OPERATING- 
CHARACTERISTIC CURVES 


The OC curves shown in Figures 1-16 were computed for one-sided 
sampling procedures based on the estimate #(Z, s) of Section VI-4. As 
mentioned previously, p< p* if and only if <+ks<U where k is given 





" This value of » obtained in this way is exactly that obtained by Patnaik who suggested that 

Vac(U ~4)/R may be approximated by non-central t with degrees of freedom » but with eccentricity 

= —u)/o. This approximation is excellent and will hold for all p but would not fit into the framework 
the proposed estimate 7(2, R). See [10]. 
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by the relation k =(n—1)(1—26,*)/+/n and 6, is defined by 


By* (< ) 
—=— I) =. 
J as (— p 


’ U—z 
Pr {bsp} = Pr (e+hs Uj = Pr } = it 
8 
(va(U — 4) _ vale - w 
a o = V/nk}- 
s/o 


The quantity on the left inside the last bracket has the non-central 
distribution with degrees of freedom f=n—1 and eccentricity 
5=4/n(U —y)/c. If it is given that the fraction of a normal population 
exceeding U is equal to p, then (U—y)/o=K, where K, is defined by 


© o—t/2 
J Ep J/ Qn ” 
In this case the parameter 6 is equal to ~/nK,. The points on the OC 
curves were computed by finding in the Johnson and Welch tables [7] 
the probability that a non-central ¢ variate with parameters n—1 and 
\/nK, would exceed the value +/nk. 

To include separate sets of OC curves for the average-range plans 
and the known-c plans would have been impracticable simply because 
of sheer bulk. Therefore it was deemed necessary to use the same sets 
of graphs for all three types of sampling plans as well as for the two- 
sided plans. 

In the case of known-o this was done by altering the sample sizes 
and the values of the acceptance criteria so as to obtain OC curves 
passing through two selected points on the curves given in Figures 1-16. 
It is evident that specifying two points on the OC curve of a sampling 
plan uniquely determines the sample size n and the value of the cri- 
terion p*. Since for a known-c plan #(2)<p* if and only if 
#<U—1/(n—1)/nK,o, and # is a normally distributed variate, it was 
only necessary to solve the equations 


Therefore 











= Pr | 





VnK », — Vn — 1K, = Ly, 
VnK », —~J/n—-1K, = Lp 
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for n and p* in order to obtain an OC curve passing through the two 
points (pi, Ly,) and (pe, L»,). In particular, the curves were matched 
for the point for which p;=AQL, and the point for which L,,=.10. 
Since the sample size is necessarily an integer and the solutions of the 
equations do not yield integer values for n, the match is not exact. 
However for n2 10 it is exact within the limits of accuracy inherent in 
reading the graphs. 

In the case of average-range plans the above procedure was altered 
in the following ways. For sample size code letters B through G the 
same sample size was used as in the corresponding plans based on the 
sample standard deviation. The acceptance criterion was changed so 
that the probability of acceptance for a range plan was the same at 
the AQL as for a standard deviation plan. Then a few additional 
points on the resulting OC curve were computed to check on the magni- 
tude of the deviation between the two curves. For no case was this 
deviation so large that the resulting curve was closer to the adjacent 
curves on the graph than it was to the one it was intended to match. 
For sample size code letters beyond G a shift was made to a new 
sample size for each code letter in order to facilitate the matching 
which was then carried out as in the case of code letters B through 
G. All computations used in this connection involved computing 
Pr { #+kR<K,]| p} by numerical integration using tables of the 
probability distribution of the average-range which were obtained 
from [12]. 


(Tables and Charts are given on following pages) 
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Sample Size Code Letter: G 














Fia. 6. Sampling Plans for Sample Size Code Letter 













































(Curves for sampling plans based on average range and known standard deviations are essentially equivalent) 









































OPERATING CHARACTERISTIC CURVES FOR SAMPLING PLANS BASED ON UNKNOWN STANDARD DEVIATION 
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(Curves for sampling plans based on average range and known standard deviations are essentially equivalent) 
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PRACTICAL APPLICATIONS OF THE THEORY OF 
EXTREME VALUES* 


BRADFORD F. KIMBALL 
New York State Public Service Commission 


of extreme values at the National Bureau of Standards under the 
sponsorship of the Applied Mathematics Division in the fall of 1949. 
An account of these lectures has recently appeared in published form 
as one of the Applied Mathematics Series of the National Bureau of 
Standards. At present this publication represents the only compre- 
hensive source of material on this subject, including a concise account 
of both the theory and some of its applications. 

Professor Gumbel was the first to call the attention of engineers and 
statisticians in this country to possible applications of the formal 
“extreme-value” theory to certain distributions which had previously 
been treated empirically [3, 4]. The first type of problem so treated in 
this country had to do with meteorologicai phenomena—annual 
flood flows, precipitation maxima, etc. This was in 1941. Since that 
time the field of fruitful applications has increased greatly, and it 
continues to increase. 

Someone might ask at this point: “Just what is meant by a distribu- 
tion of extreme values?” Essentially an extreme value is an ordered 
sample value. Thus a sample of n values x; is arranged in ascending or 
descending order of magnitude so that the subscript ¢ indicates order. 
If ordered from low to high 2, becomes the lowest extreme and z, the 
highest extreme. One may also consider the “mth” extreme which refers 
to the mth ordered value proceeding upwards or downwards from one 
end of the series. If now a succession of samples is taken from the same 
universe, interest may center about a single extreme value of each such 
sample. In many problems this extreme taken from each sample may 
be the only value recorded—as in maximum annual fiood flows, or ex- 
tinction times of bacteria. The question then arises as to the type of 
probability distribution which may be expected to apply to such a series 
of observed extremes. It is to the study of the theoretical forms of such 
leona ane on Statistical Theory of Extreme Values and Some Practical Applications, by E. J. 
tea, D, Oe rr ctures, National Bureau of Standards, Applied Mathematics Series, 33 (Washing- 

overnment Printing Office, 1954), and Probability Tables for the Analysis of Extreme- 


Value Data, National Bureau of Standards, Applied Mathematics Series, 22 (Washington, D. C.: U. 8. 
t Printing Office, 1953). 


. GuMBEL delivered a series of four lectures on the theory 
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distributions and their applications that Gumbel’s lectures are ad- 
dressed. 

The published lectures cover 51 pages including a bibliography of 
some 80 references. The material is arranged as follows: Lecture 1, 
Survey of Practical Applications of Extreme-Value Theory; Lecture 2, 
Exceedances, Return Periods and Probability Papers; Lecture 3, Exact 
and Asymptotic Distributions of Extremes; Lecture 4, Applications. 
Thus, the theory of extreme-value distributions is not treated until 
the third lecture. The reason for this is that certain of the methods 
developed to handle extreme-value data are general and can be applied 
to other types of distributions. These methods are accordingly intro- 
duced first, with the thought that they can be applied to more general 
types of distributions. 

It will possibly clarify the comments which are to follow if at this 
point the specific form of the extreme-value distribution which furnishes 
the piéce de résistance of these lectures, is set forth. This is the extreme- 
value distribution of Type I (see (3.16) and (3.17) of the Lectures). 
It is the asymptotic form of the exact distribution of maximum values 
in sample from a universe whose distribution is such that its upper tail 
declines to zero like an exponential. Normal and chi-square distributed 
universes are of this type. It is sometimes referred to as the doubly 
exponential distribution of maximum values since it is the only one of 
the three types which is a double exponential. 

The probability density function ¢ and the cumulative probability 
® are usually expressed by means of the equations 


(1) by) = er, oy) =err wiey<cn 
(2) y = a(x — u) 


where the parameters are a and u [5]. The intermediate variable y, 
is often referred to as the “reduced” variate and a or 1/a as the “scale” 
parameter since it relates the scale of measure applicable to the ob- 
served variate x to that of the intrinsic variable y. The parameter u 
turns out to be the mode of the extreme-value distribution. Hereafter 
when we refer to “extreme-value” distribution without indication of 
type we shall mean the distribution (1) above. 

For purposes of graphical fitting, extreme-value plotting papers have 
been prepared so that with z measured along one axis distances along 
the other axis are proportional to y, but readings along the y-axis are 
the values of ® given by (1). Obviously then the critical relation (2) 
which determines the parameters, is a straight-line relation on this 
plotting paper. Most of the charts illustrating applications in Lecture 4 
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are based on (1) and (2) above. They deal with a series of extreme 
values which itself is completely ordered as to size. The curious problem 
then arises as to how the plotting points 2m, (Ym) should be chosen in 
order to represent the data fairly. The simplest convention is to em- 
ploy the general relation applicable to any distribution 


™m 
(3) E[®(ym)] = ask 


and accordingly take @ equal to m/(n+1) for an observation z,. This 
convention might be referred to as the convention of “equidistant spac- 
ing” since P(Ymi1) —P(Ym) =1/(n+1) for any value of m. It is the method 
preferred by Gumbel [2, pp. 13-15]. 

Another choice of the plotting positions of ®(y,,) is to use the rela- 
tion 


$= 4[E(ym)] 


which has the advantage that it eliminates a bias when the line is 
fitted by eye. This method is more complicated in that E(y) has to be 
taken from a table or chart. Such a table and chart have been prepared, 
but are not yet available in the published literature [10]. 

One may consider the graphical fitting as a preliminary step to be 
followed by an analytical method of determining the parameters. 
Viewed in this light the simpler choice, using the relation (3), seems 
adequate. Where extrapolation and forecasts are important, analytical 
procedures of fitting the parameters are required. 

Two such methods of determining the parameters are suggested by 
Gumbel in his lectures. In [3] he suggests that the method of moments 
be used. This results in the relations 


1 
(4) = = = ay u=Z—y/a 
a Tv 


where and s, denote the mean and standard deviation of the ob- 
served values of xz, and y denotes Euler’s constant [2, (3.29) and 
section on Estimation of Parameters]. These are derived directly from 
the mean and standard deviation of the reduced variable y, which 
are 


(5) J=7, ee r//6. 


Later Gumbel derived intuitively a modified form of this procedure, 
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given by 
1 8z y(n) 





6) a a(n) on 

where y(n) denotes the mean value of y found from the n plotting 
positions ®(y,,.)=m/(n+1), and o(n) denotes the standard deviation 
of these values of y [2, Lectures 2 and 3]. Gumbel points out that as 
nm becomes infinite y(n) and o(n) approach the mean and standard 
deviation for the continuous case, if such exist. He also points out that 
the above relation (6) is general in character, applying to any distribu- 
tion such that the cumulative probability is a known function of an 
intermediate variable y, which in turn is an unknown linear function 
of the observed variable xz, as in (2) above. 

This is the method of determining the parameters which is recom- 
mended by Gumbel, as indicated in the Summary of Procedures on 
page 46. The quantities y(n) and a(n) depend upon the sample size n. 
On page 29 of the lectures may be found a table of these quantities for 
various sample sizes, and directions for interpolating between tabulated 
values. A table applicable to fitting the normal distribution by this 
method is also shown on page 17. 

The derivation of (6) is set forth in the section Fitting Straight Lines 
in Lecture 2. Having plotted the data using convention (3), Gumbel 
sets up the moment conditions of least squares for minimizing vertical 
distances and horizontal distances, thus arriving at two sets of esti- 
mates of the parameters 1/a and y» (the parameter yu is used in Lecture 
2 where the distribution function is not necessarily that of extreme 
values). He then takes the geometric mean of these two sets of esti- 
mates and in this way arrives at the relation (6) above, which appears 
as (2.20) and (3.39) in the lectures. 

One might note that the relation (6) is the relation which would be 
obtained if the line were fitted to the normalized variables (y—y(n))/a(n) 
and (x—)/sz by minimizing the sum of the squares of the perpendicular 
distances from the plotted points to the fitted line. 

The method does not consider fundamental questions of bias and 
efficiency. Although these lectures were delivered to an audience inter- 
ested primarily in applications, one should call attention to the fact 
that the above method is a tour de force in the interests of producing 
a simple, general method which will appeal to the practical man, and 
be sufficiently accurate for his purposes. Since the method is recor- 
mended in the Summary for use in fitting the distribution of extreme 
values we refer the reader to an investigation made by Lieblein [1], 
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p. 48] in which he concludes from empirical sampling tests that what 
he calls “the original Gumbel estimator” (referring to 3.39) of the Lec- 
tures and (6) above) “is more biased and much less efficient than the 
simplified form” referring to (3.29) of the Lectures and (4) above). 

One might remark that this whole problem of fitting the extreme- 
value distribution has proved to be a knotty one. The efficient, maxi- 
mum-likelihood estimate of the parameters is somewhat complicated, 
especially for large samples [8]. Another method [8, (4.5)] retains the 
qualification of sufficiency in the extended sense, but is more or less 
complicated. In the interests of simplicity Lieblein has introduced a 
method applicable to samples of any size based on six (or less) order 
statistics which he finds more efficient than the Gumbel method, 
using (4) above. This has an efficiency ratio of about 80 per cent rela- 
tive to the maximum likelihood estimate for theoretical x (defined more 
precisely as the Cramér-Rao lower-bound), where the cumulative 
probability is 0.99. In terms of annual flood estimates this means that 
the estimate of annual flood which is expected to be reached or exceeded 
1 per cent of the time has an efficiency ratio of 80 per cent [11, Table 
IV}. 

One may conclude that for many practical purposes the method of 
plotting indicated by (3) above which is recommended by Gumbel, 
and the method of fitting indicated by Gumbel’s equations (3.29) and 
by (4) above, will be satisfactory. At the time the lectures were given 
Lieblein’s work on the use of selected order statistics as estimators 
was not available. This should be looked into if forecasting is critical. 
If an intensive job on a single series is to be done, it would be well to 
compare results obtained by the method of maximum likelihood or the 
similar method which also uses “sufficient statistical estimation func- 
tions” [8, (4.5) and (5.4)]. 

The principal warning signal which this reviewer believes should be 
set up for the lay reader is in connection with Gumbel’s so-called 
“control curves”. These control curves are based for the most part on 
the formulas 





Vn o(Ym) = V (Ym) [1 — &(Ym) ]/b(Ym) 

o(2m) = Vn o(Ym)/Vn a 
where y» denotes the reduced variate of the mth ordered value, 7m 
the mth ordered value observed, and a the population parameter 


(see sections entitled Control Curves on pp. 17, 27, 31 and 48). This 
means that Gumbel prefers to ignore the sampling error involved in 


(7) 
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estimating the position of the fitted distribution curve (straight line on 
probability plotting paper). 

For the distribution of maximum values [11, pp. 27 and 59] Lieblein 
has derived the sampling variance of the position of the curve for an 
estimate based on order statistics. The asymptotic variance (applicable 
to large samples) for the efficient (maximum likelihood) method of 
estimate has been found by Kimball [9, (12)]. 

Apparently Gumbel does not wish to include the sampling variance 
involved in the estimation of the parameters and the resulting sampling 
variance in the position of the fitted distribution curve. This means that 
the experimenter should use the control curves only as a check on the 
amount of scatter of the data points about the fitted curve. Indirectly, 
a bias of scatter on one side or the other might indicate error in the esti- 
mated position of the distribution curve. It might be noted here that 
the theoretical measures of the scatter about the theoretical distribu- 
tion curve—see (2.25) and (2.26)—which define Gumbel’s control 
curves, are based on intrinsic properties of the distribution, and on 
the fitted data only indirectly through the involvement of the parameter 
a. Thus the control curves are a measure of the inherent dispersion 
to be expected of individual observations about their theoretical 
means. An analytical test of goodness of fit would be based essentially 
on a comparison of actual behavior of deviations relative to such 
expected behavior (for example, the test devised by Sherman [12]). 

The warning signal goes up when the experimenter thinks that the 
control curves offer a confidence band for purposes of extrapolation 
or estimation of what might be expected to happen at some specific 
value of the independent variable (usually time). Researchers working 
with the conventional least squares fitting of polynomials know that 
to the variance of scatter is added the sampling variance of the position 
of the fitted curve at the point in question. An allowance of the same sort 
will have to be made in dealing with the extreme-value distribution, 
the allowance depending upon the extent to which the estimators of 
dispersion about the fitted line are statistically independent of esti- 
mators of the position of the theoretical line. Until this is done the 
complete treatment of the variance of a forecast remains an unsolved 
problem. 

The lay reader might well start with Lecture 4 and work backwards 
by first referring to the section Summary of Procedures and then to 
explanations of the methods which appear in the earlier lectures. 

In detail, the subject-matter covered by the lectures is as follows: 
Lecture 1 is a sort of general introduction designed for the lay reader 
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and should tend to stimulate his curiosity. This reviewer finds the ac- 
counts of the history of the theory, and the historical aspects of the 
attacks on such problems as flood frequencies and breaking strength 
of materials, of great interest. The author occasionally lets his enthusi- 
asm run away with him, however, as in the statement: “.. . the dis- 
tributions of floods, long studied by engineers, can be fully understood 
through this theory”. 

Lecture 2 opens with an introduction to the distribution-free law of 
exceedances. This is followed by a brief exposition of some of Gumbel’s 
work on “the law of rare exceedances” with application to tolerance 
limits. A more complete treatment is to be found in the article by Gum- 
bel and von Schelling [6]. 

The next four sections The Return Period, Expected Extremes, Con- 
struction of Probability Papers and Plotting Positions involve valuable 
expository material. These four sections, and indeed the rest of Lec- 
ture 2, are designed to apply to an ordered sample of data taken from 
any distribution where the cumulative distribution function F(z) is a 
completely defined monotonically increasing function of z over its 
range from F=0 to F=1. The concept of “the return period” defined 
as 1/(1—F(z)) and denoted by T(z) is introduced. This concept 
Gumbel apparently believes may be useful in studying many distribu- 
tions other than that of extreme values. His statement following (2.8) 
which reads: “It is the number of observations such that, on the aver- 
age, there is one observation equalling or exceeding x” might be better 
phrased as: “it represents the sample size such that on the average there 
would be just one observation equalling or exceeding zx.” 

Closely allied to the concept of the “return period” is that of an “ex- 
pected extreme.” This is defined as the value x=w, for a sample of 
size n such that the return period is precisely equal to the size of the 
sample. Thus 


(8) T(un) =n = 1/(1 — F(u,)), F(un) = 1 — 1/n. 


This concept has been found useful in the analysis of the transition 
from an exact distribution of extreme values to the asymptotic dis- 
tribution (see pp. 19 and 21). 

The sections on Construction of Probability Papers and Plotting Posi- 
tions point up the problem which arises in determining the proper plot- 
ting of the frequencies of completely ordered sample data, for the pur- 
poses of graphical methods of curve fitting. The five “postulates” con- 
cerning the appropriateness of a plotting scheme, which are listed on 
page 14, should be taken with a grain of salt. For example postulate 
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3—“The observations should be equally spaced on the frequency 
scale”—would not follow under the scheme of using F(%,) as the proper 
plotting value of the cumulative frequency F for the mth ordered 
observation x, where , denotes E(x). In this case F(#m41) —F (Em) 
is not in general independent of m, and there seems to this reviewer no 
inherent reason why this should be required. 

The section on Fitting Straight Lines takes up the problem of the 
analytical determination of the two parameters involved, and has 
already been discussed. It should be noted that as introduced in 
Lecture 2 the method proposed is to apply to any distribution ®(y) 
where @ is known, but y is a linear function of z involving the two 
parameters to be determined. 

The section entitled An Unsolved Problem is devoted to the discus- 
sion of a linear relation which Gumbel has discovered empirically, 
which makes it possible to use tables and charts for short-cutting the 
computation of the quantities y(n) and o(n) which appear in the esti- 
mation equations (2.20), as applied to the fitting of a normal distribu- 
tion. This relation is discussed again under the same section heading 
on page 29 where application is made to the distribution of extreme 
values. 

Lecture 2 concludes with a section Control Curves. As mentioned 
previously in this review, these are confidence bands indicating the 
degree of scatter of the plots of observed sample data which is to be 
expected about the theoretically correct mean of each such ordered 
sample value. Gumbel points out that near the extremes the distribu- 
tion should approximate that of the extreme-value distribution more 
closely than that of the normal distribution and accordingly suggests 
that another formula be used in measuring this scatter in the neigh- 
borhood of the extremes of the data series. This is gone into by Gumbel 
in the later section Extension of Control Curves on page 27 and again 
in the section Control Curves on page 31. Lieblein has criticized the 
Gumbel choice of the parameter a,, as equal to a in this extension 
process (see p. 65 of [11]). Lieblein, however, is thinking of the control 
curves as designed to include the sampling variance of the position of 
the fitted curve. In terms of the concept that the control curves indi- 
cate only the expected scatter about the true position of the fitted 
curve, Gumbel’s conclusions stated in (3.44) and (3.44a) are correct. 

Lecture 3 contains the meat of the extreme-value theory. The rathe- 
matical analysis leading up to the three forms of the extreme value dis- 
tribution (labeiled I, II, and III, pp. 21-22) is not easy to follow under 
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anybody’s exposition. This is contracted into three pages, and hence 
the intuitive approach followed by Gumbel can be forgiven. Indeed 
if the mathematically inclined statistician should supplement this by 
studying the Fisher-Tippett and von Mises’ articles, Gumbel’s approach 
should aid the conceptual appreciation of the transition from the exact 
distribution to the asymptotic distribution. One might mention a 
slip in the explanation of the Fisher functional equation on page 21 


F(x) = F(a,xz + b). 


Keeping to the notation used in the equation, Fisher assumes n sam- 
ples of size N (rather than N samples of size n) in order to compare the 
distribution of the nth extreme among the extremes of the n smaller 
samples with the extreme of the consolidated sample. Thus F(x) here 
refers to the cumulative distribution of maximum values, usually de- 
noted by (x) in these lectures. 

The section Estimate of Parameters must be considered at present 
out of date in coverage. The relations (3.29) constitute Gumbel’s early 
preference. As mentioned above, Lieblein finds these preferable to 
the method of (3.39) recommended in the Summary. 

The section Extreme mth Values introduces the asymptotic distribu- 
tion of the mth largest value. A curious fact about the distribution of 
the largest extreme value is that the asymptotic variance about its 
mean is equal to the variance of the parent extreme-value distribution. 

The section Return Periods is of decided interest, introducing an 
ingenious asymptotic formula (3.48) 


P(K) = eK — e-*, 


In terms of a return period 7’ corresponding to an extremum 2, this 
relation gives the probability that an extremum as large as x should 
occur for the first time after ¢, trials and before & trials are made. The 
formula shows that 4:=7/K and t2=KT where the constant K is 
determined directly from the probability and is independent of T. This 
formula is reeommended in the Summary in connection with the extra- 
polation of the control curves. Again the reader should be cautioned that 
the value of T in its relation to z involves a sampling error and hence 
the interval measures only the degree of scatter to be expected about 
the true T(x). 

The present reviewer finds the last sec.ion, Weibull’s Use of the Third 
Type, of particular interest for several reasons. This is of the limited 
type which he at one time studied as a possibility in describing the 
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distribution of annual floods. Shigeo Kase [7] has found that the dis- 
tribution of tensile strengths of rubber can be described by the distri- 
bution (3.23) with 


y = (S — S*)/B 


where S represents the measure of the tensile strength of specimens 
with a constant cross section and S* the median. If one places the 
lower limit of S at zero one might replace S by k log x which reduces 
the distribution precisely to (3.25) with w=0. Cumulating the frequen- 
cy in the opposite sense, 


@ = exp —(z/v)* 
log @ = — (2/v)*. 


This form of the distribution may be studied by the use of log log 
plotting paper along with the following simple formula for E[log ®n] 





1 1 
E[—log ®(ym)] = — + es] 


m+l 


which applies to any well-mannered cumulative distribution function. 
Furthermore, the analytical process of determining the parameters 
v and k would be considerably simplified [1]. 

The rest of Lecture 3, not already referred to in the review of Lec- 
ture 2, is expository material comparing characteristics of the doubly 
exponential extreme-value distribution with the normal distribution. 

Lecture 4 is an excellent account of the application of the extreme- 
value theory to many practical problems of a diverse nature such as 
floods, maximum gust loads in aeronautics, old age, bacteria extinc- 
tion times, radioactive emissions and breaking strength of materials. 
It is well illustrated and fairly easy reading. 

This series of lectures gives a very condensed account of a great 
deal of pioneer work. Professor Gumbel is to be congratulated on the 
job that he has done. 

Attention should be called here to the recent publication of Probabil- 
ity Tables for the Analysis of Extreme-Value Data [13]. These tables 
should certainly be acquired by anyone doing work involving the ex- 
treme-value distribution. 

This publication contains the following six tables: 

Table 1. Cumulative probability and density functions of extremes. 

Table 2. Inverse of the cumulative probability function of extremes. 
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s- Table 3. Probability density of extremes as a function of the cumu- 
i- 5 lative probability. 
Table 4. Probability points y, for mth extremes, m counted from 
above. 


Table 5. Cumulative probability Y and density function y for the 
ns reduced range R. 


a CU” Table 6. Reduced range R as a function of the cumulative probability 
es Vv. 
- The first three tables will be found indispensable for work involving 





the application of the theory of extreme values. The fourth table gives 
the values of the reduced variate y, at the probability points .005, 
.010, .025, .050, .100 and .500 at both ends of the distribution, for the 
15 largest sample values and beyond this at intervals of 5 up to m=50. 
2 The distribution function is the asymptotic distribution of the doubly 
] exponential type discussed on page 27 of Gumbel’s published lectures. 

Tables 5 and 6 are based on the asymptotic distribution of the 
“reduced range” R. Although some explanation of the concept of re- 
duced range is given in the Introduction, it might be well to have in 
mind the relation 





8 r w-wW y = Euler’s constant, 
| Rk = + 27, 
V3 Ow w = observed range, 
» and refer to Gumbel’s article in the Annals of Mathematical Statistics, 


4 “The Distribution of the Range” (Vol. 18 (1947) pp. 384-412). 

. The tables in other respects are clearly explained in the Introduction. 

8 As regards accuracy one finds the statement on page 11: “The entries 

. in all these tables are guaranteed to about a unit in the last place 

4 given, with the possible exception of the table of probability points for 
mth extremes, which is correct to within several units in the last place, 
provided that Thompson’s table (of percentage points of the x?-distri- 
bution) is correct to within several units in the last place.” Parenthesis 
is supplied by this reviewer. 

‘. With prices of most publications what they are today, it is distinctly 

; a privilege to be able to obtain publications of this calibre at the prices 

a quoted for these. 
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ESTIMATE OF THE INTEGRATED NORMAL CURVE 
BY MINIMUM NORMIT CHI-SQUARE WITH 
PARTICULAR REFERENCE TO BIO-ASSAY 


JosEPH BERKSON 
Mayo Clinic 


TT” integrated normal distribution function may be written 


— %; —_ at+pr;i ‘ 
P; = 1/./2x f e"ltde = 1/4/2e f edz (1) 


where P;, the value of P at x= 2;, is the area under the normal fre- 
quency curve from r=— © to x=2;, » is the mean of the frequency 
function, o is the standard deviation, a= —(yu/c), B=1/c, and »; is 
given by (2). 

The straight-line transform of (1) is! 


‘ zy BM 
Normit P; = 4; = = a+ Bx; (2) 


og 





therefore if the normit of P is plotted against z, the points will fall on a 
straight line with a as the intercept and £ as the slope. 

The integrated normal curve (1) has been advanced extensively in 
recent years for use in bio-assay with “quantal response.” Here P; rep- 
resents the true probability of death (or some other all-or-none effect) 
at_ x=2;. In this statistical model it is assumed that p;, the observed 
proportion of individuals affected out of n; exposed at z;, can be con- 
sidered a random variable binomially distributed around the true P;, 
with variance o5,= P,Q;/ni. 

For the estimation of the parameters a, 8 of (1), the method of 
maximum likelihood has been advocated, on the basis of the optimum 
asymptotic properties of the maximum likelihood estimate [10]. In this 
paper is presented another estimate, formally analogous with the mini- 
mum logit x? estimate of the logistic function,’ called the “minimum 
normit x? estimate.” 





1“*Normit” is intended as a diminutive for the “normal deviate” of Galton and Sheppard [17]; 
the widely used ‘‘probit”’ is equal to the normit plus 5. The normit is used in the present development 
rather than the probit, because of the greater simplicity resulting from the use of the normit. Employ- 
ment of the probit would have necessitated doubling the size of the tables of this paper, involved larger 
numbers in the calculations and, because of the arbitrary constant 5, confused mathematical analysis 
{11] [24]. If ‘‘probit analysis” is the appropriate name for maximum likelihood estimation of the inte- 
grated normal curve with the use of probits [10], the analogous name of ‘“‘normit analysis’ can be used 
for estimation by minimum normit x?, with the use of normits. 

2 The development of the minimum logit x? estimate of the logistic function, however, was some- 
what differently motivated, and that estimate has the property of sufficiency, which so far as I know, is 
not possessed by estimates of the integrated normal curve. 
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The minimum normit x? estimate is defined by minimization of the 
following quantity, called the “normit x?,” which is asymptotically 
distributed as x?: 

2 
x? (normit) = >> n; a (v; — ¥;)? (3) 
Piqi 
where n; is the number exposed at 2;, p; = 1—gq; is the observed propor- 
tion affected at z;, v; is the observed normit at 2;, 7; is the estimate of 
the normit, z; is the ordinate of the unit normal curve at the point where 
its area is divided into p; and q;, and is given by 

—_—" (4) 

2; a" v;/2. 

Since the minimum normit x’ estimate is definitively calculated without 
iteration, while the maximum likelihood estimate is obtainable only as 
the limit of successive iterative procedures, the minimum normit x? 
estimate is much more easily determinable precisely than the maximum 
likelihood estimate [1] [2]. 

Recently Taylor [27] has demonstrated that the minimum normit x’ 
estimate is in the class of estimates R.B.A.N. of Neyman, and conse- 
quently has the same asymptotic properties as the maximum likelihood 
estimate. The minimum normit x? estimate, as well as the maximum 
likelihood estimate, is therefore asymptotically efficient. If an estimator 
is asymptotically efficient, it is widely accepted as implying that with 
large samples the variance is minimum and given by E(dln¢/dé)’, 
where ¢ is the probability of a sample and @ is the parameter estimated. 
Although in strictly rigid mathematics, the properties associated with 
asymptotic efficiency imply nothing necessarily for finite samples, the 
approximation for large samples is applicable in many specific cases. 
Even where an implication for finite samples is valid, what size sample 
is “large” and how close the approximation is, must be independently 
determined. In order to get some idea for the present case, the following 
“large sample” experiment was tried: Three equally spaced dosages 2, 
corresponding respectively to true P’s 0.3, 0.5, 0.7; n=50 at each dose; 
8=0.524401, considered known, «=0 to be estimated. A stratified 
random sample of 600 was used.*? Both the maximum likelihood esti- 





3 For the method of sampling used in this and other experiments presented, see reference [3]. For 
the minimum normit x? estimate the noniterative procedure of calculation described in the present text 
was used; for the maximum likelihood estimate, the interative method of probit analysis was used em- 
ploying the Finney-Stevens tables [12] and also the method of Garwood as presented with auxiliary 
tables by Cornfield and Mantel [7], supplemented by the W.P.A. tables [21]. 
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mate and the minimum normit x? estimate of a are unbiased for this 
dosage arrangement. The variance of the maximum likelihood estimate 
was determined as 0.011191; the variance of the minimum normit x? 
estimate was determined as 0.011186. The value of the asymptotic 
variance is equal to the reciprocal of >>(nZ?/PQ) where n is the number 
exposed at the several dosage values, 50 in the present example, 
P=1-—Q is the true P at the dose levels and Z is the ordinate of the 
normal distribution function corresponding to the integral P. For the 
present case, the asymptotic variance o4=0.011186, so that the vari- 
ance of the minimum normit x? estimate was found to be equal to the 
asymptotic value to the precision of six decimal figures, and the vari- 
ance of the maximum likelihood estimate was found to be slightly 
larger than the asymptotic value. The extreme closeness of the variance 
of the minimum normit x? estimate to the asymptotic value is impres- 
sive, and the finding of a somewhat larger variance for the maximum 
likelihood estimate is in agreement with experiments to be described 
below; but the present point is that with n=50 at each of three dose 
levels, it appears that both estimates have attained their asymptotic 
distributions to a very close degree of approximation and we may 
reasonably expect that the situation is not much different for other 
similar arrangements of dosages and also with both parameters to be 
estimated, when the total number in the experiment = 150. 

The finding of a greater variance for the maximum likelihood esti- 
mate as compared with the minimum normit x? estimate, in the experi- 
ment with “large” n, as well as a similar previous finding for the analo- 
gous minimum logit x? estimate [3], suggested the desirability of com- 
paring the two estimators for “small samples.” A series of experiments 
were carried out similar to those performed to compare the minimum 
logit x? estimate with the maximum likelihood estimate of the logistic 
function [3], but not so extensive, each experiment here being based on 
a stratified random sample of 600. Experiments were performed as for 
three equally spaced doses, 10 at each dose; at dosage values scaled —1, 
0, +1, the respective values of P were taken as 0.3, 0.5, 0.7, which 
defined the parameters as a=0, 8=0.524401. 

The program was in two sections, (1) 8 considered known, only a to 
be estimated, and (2) a and 6 both to be estimated simultaneously. 
Eight experiments were performed, one for estimate of a, one for esti- 
mate of a and 8, at each of the following values for central P: 0.5, 0.6, 
0.7, and 0.8. The results are summarized in Table 1. It is seen that in 
each of the experiments the error variance around the mean and the 
mean-square-error from the true value of the parameter are smaller for 
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the minimum normit x? estimate than for the maximum likelihood 
estimate. This result is in agreement with the comparison of the maxi- 
mum likelihood estimate with the minimum logit x? estimate for the 
estimation of the logistic function reported previously [3}, but the dif- 
ference in favor of the x? estimate is generally greater in the case of the 
logistic function than with the integrated normal curve. 

Of noteworthy interest is a consideration of the variances in relation 
to J, Fisher’s “amount of extractable information.” For estimation of 
the single parameter a, the value of the asymptotic variance is also 
that of the Cramer-Rao [8, 26] lower bound in finite samples for a 
regular unbiased estimate, and is equal to 1/J. It has been noted in the 
“large sample” experiment with dosages symmetrically disposed about 
P=0.5, that the variance of the minimum normit x? estimate was found 
to be practically equal to the asymptotic variance, while the variance 
of the maximum likelihood estimate was slightly above it. For the same 
experiment with small samples the variance of the minimum normit 
x’ estimate in relation to 1/J is less than with large samples, so that 
with small samples, zt is less than 1/I. On the other hand the variance 
of the maximum likelihood estimate in relation to 1/J increases with 
decrease of sample size, so that it is greater than 1/7. The position of 
the variance of the minimum normit x? estimate below 1/J is main- 
tained for other dispositions of the dosages where the estimate is biased, 
and even the mean-square-error is less than 1/7, while the variance of 
the maximum likelihood estimate is everywhere above 1/J. Here again 
relations are similar to those found with the analogous estimates of the 
logistic function [3]. 


CALCULATION 
The normal equations for obtaining the estimates of a and 6 are 
Zz, nw,(vi —_ Vi) = 0, (5) 
Zz. NiWix;(V; = v;) = 0, (6) 
where w; = 27/p:qi, ¥; is the observed normit at z;, and 7; is the estimated 
value of the normit. 


The evaluation of (5)(6) leads to a procedure that amounts simply 
to obtaining a least squares solution of the straight line 


W = at bz; (7) 


with n,w; as weight of the observation »;. The values of a and b, which 
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are the estimates respectively of a, 8, are given by 


> rere y ss NW; bm NiWit; 
TWP ids > ni; 


(>> nwixi)? 
> NW; 





- > ™ nw, (5 - BD) (2x5 = Z) = 
- Z Nw; (Xi = Z)? ” 





» (8) 





bm nwa? — 


— Yinwam — bd) nwa: 
- D new: 

> Ni{WX;i ol p a NWiV;i 
= p= ro. rai 


The E.D. 50 =7, the dosage value z which produces 50 per cent effect, 
is the value of x; in (1) for which P;=0.5, and is given by 





(9) 


y= — a/B. (10) 
The estimate of y, represented by 250, is given by 
tso = — a/b. (11) 
We may write the estimated normit linear equation as 
¥; =~ a+ bz; = a’ + O(z; — Z) (12) 


where a’ =9, 

If x is measured as the logarithm of the dose D, then 25 =log Dso, 
where 250 is the estimate of , the value of z corresponding to a 50 per 
cent response, and Dso is the estimate of the actual dose producing this 
response. Formulas for variances of the estimates of the parameters 
may be written as follows:‘ 

P 1 2 1 


8a’ = bd 8b = 


> nw > nw(z — @)? 
2 


2 _5 2 
8, = Sy + £78, 


1 Dso \? 
é. es «au [ser + 84(Xs0 _ #)*|, SDso = Size ( a ) . 
b? og e€ 











An example of calculations is given in Table 2. 





4 These are the asymptotic variances, with estimates obtained from observations replacing the true 
values of the parameters. For development of the asymptotic variances, see references [10] and [18]. 
For a discussion of the limited usefulness of formulas for the standard errors of the estimates see references 
[1, Sa}. ; 
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Tables are provided for facilitation of the calculations. The refine- 
ment to which the tables are carried was determined in relation to the 
objective that the estimate should finally be written with a number of 
decimals corresponding to two significant figures in the standard error 
of the estimate. Several typical examples were tried to determine the 
minimum number of figures that should be carried in the tables, in 
order to assure a solution which is arithmetically correct to the number 
of figures required by this rule. One cannot be sure that the best de- 
cision was made; it is possible that fewer figures in the tables would 
have sufficed, and on the other hand that examples may be met for 
which the refinement of the tables is insufficient for the required pre- 
cision. 

Table 3, giving the normit for argument p, was constructed with the 
use of values given in The Kelley Statistical Tables [20]. A convenient 
table similar to that given here is Table 1 of Pearson’s Tables, Part I 
[22], in which, however, the normit is tabled only to four decimal 
places. The Kelley Statistical Tables list the argument to four decimal 
places, and the normit is given with eight decimal places. When the 
observed p is either zero or 100 per cent, a substitute working value is 
used equal respectively to 1/2n and 1-1/2n.5 

It will be observed in equations (8)(9), which are the formulas for the 
estimates, that they contain w=2z?/pq and wy, each of these being a 
function of the observed proportion p. Table 4 gives both these quanti- 
ties for argument p. This table was constructed from values given in 
Table II of Pearson’s Tables for Statisticians and Biometricians, Part 
II [23]. 

When an equation has been “fitted” to observed data, it is good prac- 
tice to calculate the values “predicted” by the equation to compare 
directly with the values observed. Table 5, wh?ch gives the area under 
the normal frequency curve, in a form convenien: ‘or the present situa- 
tion, can be used for such computation; it was calculated with the use of 
values in the W.P.A. tables of the normal function [21]. Table 5 is used 
also for the direct calculation of the Pearson x?= ).(o—e)?/e. Values 
for x?, to be used in tests of significance employing available x’ tables, 
can be obtained more easily by calculation of the normit x’ 
= >inw(y—7F)? as shown in Table 2.° The x? tables consulted in ac- 





5 For treatment of the case of zero survivors, see reference [1]. 
6 The normit x? can be calculated directly from the formula 
x! (normit) =[2njw4(»j —¥) (x —2)}*/Lnjw; (zi —2)*, 
but this is inadvisable because, (1) it necessitates carrying an extra column (nwy) in the computations, 
not required for the estimates themselves; (2) it is very important, in considering the differences between 
the observed values and the “‘predicted,”’ to note the particular observations that make the major con- 
tributions to the total] x?. Frequently a single discrepant ‘‘stray” observation may result in a large total 
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complishing a test of significance represent the asymptotic distribution 
of x?, and neither the Pearson x? nor the normit x? calculated from finite 
samples follows this distribution exactly. For both, the x? distribution 
is approached asymptotically as the n,;‘s— ©; however, so far as the 
present author is aware, it is not known which x’, the Pearson x? or the 
normit x’, for finite samples, approximates more closely the distribution 
of the tabled x?. Objectively, therefore, on the basis of available knowl- 
edge, there is no reason for insisting on the Pearson x? rather than the 
normit x”. However, it would seem to be in the order of proper statisti- 
cal comportment to use the conventional Pearson x? until another 
formulation is shown to be preferable. For this reason, and also because 
the Pearson x? involves the calculation of the expected }, which is itself 
of immediate interest, it is recommended that the Pearson x? be used. 


NOTATION ON THE HISTORY OF NORMIT ANALYSIS 


Gaddum [14], in his original (1933) presentation of the application of 
the integrated normal curve to bio-assay, utilized the same weights 
based on the observed relative frequencies as employed in this paper. 
Indeed he has specifically advocated these weights in preference to 
those required by maximum likelihood, on the grounds that the esti- 
mates are much simpler to obtain in this way and that the results are 
not much different. Here Gaddum, who in some circumstances has in- 
sisted on “logic” in what is advanced in these matters [16], has not been 
entirely logical. At the same time that he has urged the use of the ob- 
served weights for simplicity he advanced his opinion that maximum 
likelihood yields the best estimate of the normit line [15]. It is pertinent 
to ask, if a simple solution is desired, even one which admittedly is 
statistically inferior, why use any weights at all,—that is, why not use 
unit weight for all observations? Such a system of weighting yields 
consistent estimates [25], and they too usually are very close to those of 
maximum likelihood. 

I have made no special investigation regarding “priority” in the use 
of the observed weights z?/pg. Urban (1910) [28], cited by Gaddum, 
developed the method outlined here—except for the treatment of zero 
survivors, and with a different motivation—and provided a table of the 
weights 2?/pq as well as a table of normits.’? The method of normit 
analysis of this paper may therefore be regarded as the method of 
Urban, modified. 

The method of Urban apparently was in wide use for many years and 





x*, which if the formula is used and therefore the individual contributions to the x* are not apparent, 
may lead to a conclusion that the data as a whole are “heterogeneous.” 
7 In terms of the normal curve as it is formulated in the physical sciences. 
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was incorporated in at least one standard textbook [19]. Since about 
1938 it has been generally replaced by what has developed into a 
formidable statistical discipline now widely known under the name of 
“probit analysis” [5, 6, 9, 10, 13]. 

The dispossession of the method of Urban was effected, not on the 
basis of an objective examination, mathematical or experimental, of the 
merits of the established procedure compared with those of the new 
candidate, but solely, it appears, on the authority of a cryptically ex- 
pressed opinion of Sir Ronald A. Fisher [4]. The findings reported in 
the present study indicate that not only is the older method much 
easier but, judged on the basis of variance and mean square of the 
resulting estimate, it is better. 
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out TABLE 1 
oa SAMPLING ERRORS OF MAXIMUM LIKELIHOOD AND MINIMUM 
2 of NORMIT x? ESTIMATES 
(1) Estimate of a, 8 being known 
the 
the True P at dose Bias Variance Mean-square-error 
1e@€Ww Maxi- Mini- Maxi- Mini- Maxi- Mini- 
ex- ’ " mum mum mum mum mum mum 
i Low | Mid | High | jieli- | normit | likeli- | normit | likeli- | normit | 1/7 
In hood x hood x? hood x? 
uch 
h 3 5 Be —.0003° | —.0013* -0585 -0538 -0585 -0538 0559 
the .393 6 .782 0051 — .0095 .0623 .0560 -0623 -0561 -0571 
5 m . 853 -.0156 | —.0169 -0680 -0592 .0682 0595 0612 
.624 8 914 .0265 | —.0376 -0898 -0644 -0905 -0658 -0706 
(2) a and B to be estimated 
asti- —- = 
on,” Estimate of a 
. bi- True P at dose Bias Variance Mean-square-error 
10 
' Maxi- Mini- Maxi- Mini- Maxi- Mini- 
- J mum mum mum mum mum mum 
the Low Mid High likeli- | normit | likeli- | normit | likeli- | normit 
ess. hood x? hood x? hood x? 
m3 
29 3 5 of .0003* | — .0003* -0675 -0581 .0675 0581 
.393 6 -782 —.0054 |—.0101 -0888 .0825 .0888 .0826 
5 PB .853 —.0126 |—.0039 . 1502 1424 - 1504 .1424 
nall .624 8 914 |—.0189 .0358 .3109 2682 3113 2695 
38), 
Estimate of 8 
lion 
3 5 Pe -0449 -0272 - 1060 -0930 -1080 0937 
.393 6 -782 0525 0251 .1157 .0930 -1185 -0936 
1 of 5 Pi . 853 0553 .0010 - 1164 -0815 -1195 -0815 
ur- -624 8 .914 —.0230 |—.0518 - 1227 .0758 -1232 -0785 
ton * The estimates for this disposition of dosages are unbiased; the indicated values are sampling 
errors, 
ital 
wa 
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TABLE 4 
NORMIT WEIGHTS 


Upper figure in table is w=2z*/pq; lower figure is wv 
For p: less than .50 on left, wy is negative. For p greater than .50 on right, 
wy is positive 








Thousandths, for p on left 





4 5 6 





-0420 
-1083 | .1218 
-0971 
-2106 






































Thousandths, for p on right 














TABLE 4—Continued 
NORMIT WEIGHTS 


Upper figure in table is w =2*/pq; lower figure is wy 
For 7 less than .50 on left, wy is negative. For p greater than .50 on right, 
wy is positive 








Thousandths, for p on left 





3 4 5 6 





3599 


-3483 
5568 


0542 


-0383 
-6363 


-6366 
-0064 
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Thousandths, for p on right 














TABLE 5 















































ANTINORMITS* 
Thousandths 
v be 
0 1 2 3 4 5 6 7 8 9 
.00 | .50000 | .50040 | .50080 | .50120 | .50160 | .50199 | .50239 | .50279 | .50319 | .50359 .00 
01 .50399 | .50439 | .50479 | .50519 | .50559 | .50598 | .50638 | .50678 | .50718 | .50758 .01 
-02 .50798 | .50838 | .50878 | .50918 | .50957 | .60997 | .51037 | .51077 | .51117 | .51157 -02 
.03 .51197 | .51237 | .51276 | .51316 | .51356 | .51396 | .51436 | .51476 | .51516 | .51556 -03 
-04 .51595 | .51635 | .51675 | .51715 | .51755 | .51795 | .51835 | .51874 | .51914 | .51954 -04 
.05 | .51904 | .52034 | .52074 | .52113 | .52153 | .52193 | .52233 | .52273 | .52313 | .52352 05 
06 | .52392 | .52432 | .52472 | .62512 | .52552 | .52591 | .52631 | .52671 | .52711 | .52751 06 
07 | .52790 | .52830 | .52870 | .52910 | .52950 | .52989 | .55029 | .53069 | .53109 | .53148 07 
08 | .53188 | .53228 | .53268 | .53307 | .53347 | .53387 | .53427 | .53466 | .53506 | .53546 .08 
09 | .53586 | .53625 | .63665 | .53705 | .53745 | .53784 | .53824 | .53864 | .53903 | .53943 .09 
10 .53983 | .54023 | .54062 | .54102 | .54142 | .54181 | .54221 | .54261 | .54300 | .54340 .10 
ll .54380 | .54419 | .54459 | .54498 | .54538 | .54578 | .54617 | .54657 | .54696 | .54736 ll 
12 .54776 | .54815 | .54855 | .54895 | .54934 | .54974 | .55013 | .55053 | .55093 | .55132 12 
13 .55172 | .55211 | .55251 | .55290 | .55330 | .55369 | .55409 | .55448 | .55488 | .55528 .13 
14 | .55567 | .55607 | .55646 | .55686 | .55725 | .55764 | .55804 | .55843 | .55883 | .55922 14 
15 | .55962 | .56001 | .56041 | .56080 | .56120 | .56159 | .56198 | .56238 | .56277 | .56317 15 
16 .56356 | .56395 | .56435 | .56474 | .56513 | .56553 | .56592 | .56632 | .56671 | .56710 .16 
17 .56750 | .56789 | .56828 | .56867 | .56907 | .56946 | .56985 | .57025 | .57064 | .57103 Ai 
18 .57142 | .57182 | .57221 | .57260 | .57299 | .57339 | .57378 | .57417 | .57456 | .57495 .18 
19 .57535 | .57574 | .57613 | .57652 | .57691 | .57730 | .57770 | .57809 | .57848 | .57887 19 
20 | .57926 | .57965 | .58004 | .58043 | .58082 | .58121 | .58160 | .58200 | .58239 | .58278 20 
21 | .58317 | .58356 | .58395 | .58434 | .58473 | .58512 | .58551 | .58500 | .58629 | .58668 21 
22 .58706 | .58745 | .58784 | .58823 | .58862 | .58901 | .58940 | .58979 | .59018 | .59057 .22 
23 | .59095 | .50134 | .59173 | .59212 | .59251 | .59290 | .59328 | .59367 | .59406 | .59445 23 
24 | .59484 | .59522 | .59561 | .59600 | .59638 | .59677 | .59716 | .59755 | .59793 | .59832 24 
25 | .59871 | .50909 | .59048 | .59987 | .60025 | .60064 | .60102 | .60141 | .60180 | .60218 25 
26 -60257 | .60205 | .60334 | .60372 | .60411 60450 | .60488 | .60527 | .60565 | .60604 -26 
27 | .60642 | .60680 | .60719 | .60757 | .60796 | .60834 | .60873 | .60911 | .60949 | .60988 27 
28 .61026 | .61065 | .61103 | .61141 | .61180 | .61218 | .61256 | .61294 | .61333 | .61371 -28 
29 .61409 | .61447 | .61486 | .61524 | .61562 | .61600 | .61639 | .61677 | .61715 | .61753 -29 
30 | .€1791 | .61829 | .61867 | .61905 | .61944 | .61982 | .62020 | .62058 | .62096 | .62134 .30 
31 .62172 | .62210 | .62248 | .62286 | .62324 | .62362 | .62400 | .62438 | .62476 | .62514 31 
32 -62552 | .62590 | .62627 | .62665 | .62703 | .62741 62779 | .62817 | .62854 | .62892 32 
33 -62930 | .62968 | .63006 | .63043 | .63081 63119 | .63156 | .63194 | .63232 | .63270 33 
34 | .63307 | .63345 | .63382 | .63420 | .63458 | .63495 | .63533 | .63570 | .63608 | .63646 34 
35 | .63683 | .63721 | .63758 | .63796 | .63833 | .63871 | .63908 | .63945 | .63983 | .64020 35 
36 .64058 | .64095 | .64132 | .64170 | .64207 | .64244 64282 | .64319 | .64356 | .64394 .36 
37 .64431 | .64468 | .64505 | .64543 | .64580 | .64617 | .64654 | .64691 | .64728 | .64766 37 
88 | .64803 | .64840 | .64877 | .64914 | .64951 | .64988 | .65025 | .65062 | .65099 | .65136 .38 
39 .65173 | .65210 | .65247 | .65284 | .65321 | .65358 | .65395 | .65432 | .65469 | .65505 .39 
40 | .65542 | .65579 | .65616 | .65653 | .65689 | .65726 | .65763 | .65800 | .65836 | .65873 40 
41 | .65910 | .65946 | .65983 | .66020 | .66056 | .66093 | .66130 | .66166 | .66203 | .66239 41 
42 .66276 | .66312 | .66349 | .66385 | .66422 | .66458 | .66495 | .66531 | .66567 | .66604 -42 
43 | .66640 | .66677 | .66713 | .66749 | .66786 | .66822 | .66858 | .66894 | .66931 | .66967 43 
44 .67003 | .67039 | .67076 | .67112 | .67148 | .67184 | .67220 | .67256 | .67292 | .67328 44 
.45 | .67365 | .67401 | .67437 | .67473 | .67509 | .67545 | .67581 | .67616 | .67652 | .67688 45 
46 | .67724 | .67760 | .67796 | .67832 | .67868 | .67903 | .67939 | .67975 | .68011 | .68047 46 
47 | .68082 | .68118 | .68154 | .68189 | .68225 | .68261 | .68296 | .68332 | .68368 | .68403 47 
48 | .68439 | .68474 | .68510 | .68545 | .68581 | .68616 | .68652 | .68687 | .68723 | .68758 48 
.49 | .68793 | .68829 | .68864 | .68899 | .68935 | .68970 | .69005 | .69041 | .69076 | .69111 49 
0 1 2 3 4 5 6 7 8 9 











* The table gives the value of p for a specified value of the normit ». If y is negative, pis 1 minus the tabled value. 





TABLE 5—Continued 





















































ANTINORMITS 
Thousandths 
v v 
0 1 2 3 4 5 6 7 3 9 
.50 | .60146 | .69181 | .69217 | .69252 | .69287 | .69322 | .69357 | .69392 | .69427 | .69462 -50 
.51 | .69497 | .69532 | .69567 | .69602 | .69637 | .69672 | .69707 | .69742 | .69777 | .69812 51 
.52 .69847 | .69882 | .69917 | .69951 | .69986 | .70021 | .70056 | .70090 | .70125 | .70160 .52 
.53 | .70194 | .70229 | .70264 | .70298 | .70333 | .70368 | .70402 | .70437 | .70471 | .70506 53 
.54 | .70540 | .70575 | .70609 | .70644 | .70678 | .70712 | .70747 | .70781 | .70815 | .70850 54 
.55 | .70884 | .70918 | .70953 | .70087 | .71021 | .71055 | .71089 | .71124 | .71158 | .71192 55 
.56 | .71226 | .71260 | .71294 | .71328 | .71362 | .71396 | .71430 | .71464 | .71498 | .71532 56 
.57 | .71566 | .71600 | .71634 | .71668 | .71702 | .71735 | .71769 | .71803 | .71837 | .71871 .57 
.58 | .71904 | .71938 | .71972 | .72005 | .72039 | .72073 | .72106 | .72140 | .72173 | .72207 -58 
.59 | .72240 | .72274 | .72307 | .72341 | .72374 | .72408 | .72441 | .72475 | .72508 | .72541 .59 
.60 | .72575 | .72608 | .72641 | .72675 | .72708 | .72741 | .72774 | .72807 | .72841 | .72874 -60 
.6i | .72907 | .72940 | .72973 | .73006 | .73039 | .73072 | .73105 | .73138 | .73171 | .73204 -61 
.62 | .73237 | .73270 | .73303 | .73336 | .73369 | .73401 | .73434 | .73467 | .73500 | .73533 62 
.63 | .73565 | .73598 | .73631 | .73663 | .73696 | .73729 | .73761 | .73794 | .73826 | .73859 63 
.64 | .73801 | .73924 | .73956 | .73989 | .74021 | .74054 | .74086 | .74118 | .74151 | .74183 64 
-65 .74215 | .74248 | .74280 | .74312 | .74344 | .74377 | .74409 | .74441 | .74473 | . 74005 -65 
.66 | .74537 | .74569 | .74601 | .74633 | .74666 | .74698 | .74729 | .74761 | .74793 | .74825 -66 
.67 | .74857 | .74889 | .74921 | .74953 | .74984 | .75016 | .75048 | .75080 | .75111 | .75143 .67 
.68 | .75175 | .75206 | .75238 | .75270 | .75301 | .75333 | .75364 | .75396 | .75427 | .75459 68 
.69 | .75490 | .75522 | .75553 | .75585 | .75616 | .75647 | .75679 | .75710 | .75741 | .75772 69 
.70 | .75804 | .75835 | .75866 | .75897 | .75928 | .75960 | .75991 | .76022 | .76053 | .76084 -70 
.71 | .76115 | .76146 | .76177 | .76208 | .76239 | .76270 | .76300 | .76331 | .76362 | .76393 71 
.72 | .76424 | .76455 | .76485 | .76516 | .76547 | .76577 | .76608 | .76639 | .76669 | .76700 72 
.73 | .76731 | .76761 | .76792 | .76822 | .76853 | .76883 | .76913 | .76044 | .76974 | .77005 -73 
.74 | .77035 | .77065 | .77096 | .77126 | .77156 | .77186 | .77217 | .77247 | .77277 | .77307 74 
.75 | .77337 | .77367 | .77397 | .77428 | .77458 | .77488 | .77518 | .77548 | .77577 | .77607 75 
.76 | .77637 | .77667 | .77697 | .77727 | .77757 | .77786 | .77816 | .77846 | .77876 | .77905 -76 
.77 | .77935 | .77965 | .77994 | .78024 | .78053 | .78083 | .78113 | .78142 | .78172 | .78201 -77 
.78 | .78230 | .78260 | .78289 | .78319 | .78348 | .78377 | .78407 | .78436 | .78465 | .78494 -78 
.79 | .78524 | .78553 | .78582 | .78611 | .78640 | .78669 | .78698 | .78727 | .78757 | .78786 -79 
.80 | .78814 | .78843 | .78872 | .78901 | .78930 | .78959 | .78988 | .79017 | .79045 | .79074 .80 
.81 | .79103 | .79132 | .79160 | .79189 | .79218 | .79246 | .79275 | .79304 | .79332 | .79361 81 
.82 | .79389 | .79418 | .79446 | .79475 | .79503 | .79531 | .79560 | .79588 | .79617 | .79645 -82 
.83 | .79673 | .79701 | .79730 | .79758 | .79786 | .79814 | .79842 | .79870 | .79898 | .79927 -83 
.84 | .79955 | .79983 | .80011 | .80039 | .80067 | .80094 | .80122 | .80150 | .80178 | .80206 84 
.85 | .80234 | .80262 | .80289 | .80317 | .80345 | .80372 | .80400 | .80428 | .80455 | .80483 85 
.86 | .80511 | .80538 | .80566 | .80593 | .80621 | .80648 | -.80676 | .80703 | .80730 | .80758 86 
.87 | .80785 | .80812 | .80840 | .80867 | .80894 | .80921 | .80949 | .80976 | .81003 | .81030 .87 
.88 | .81057 | .81084 | .81111 | .81138 | .81165 | .81192 | .81219 | .81246 | .81273 | .81300 .88 
-89 .81327 | .81354 | .81380 | .81407 | .81434 | .81461 | .81487 | .81514 | .81541 | .81567 .89 
.90 | .81594 | .81621 | .81647 | .81674 | .81700 | .81727 | .81753 | .81780 | .81806 | .81833 -90 
.91 | .81859 | .81885 | .81912 | .81938 | .81964 | .81990 | .82017 | .82043 | .82069 | .82095 91 
.92 | .82121 | .82148 | .82174 | .82200 | .82226 | .82252 | .82278 | .82304 | .82330 | .82356 92 
.93 | .82381 | .82407 | .82433 | .82459 | .82485 | .82511 | .82536 | .82562 | .82588 | .82613 93 
.94 | .82639 | .82665 | .82690 | .82716 | .82742 | .82767 | .82793 | .82818 | .82844 | .82869 94 
.95 | .82894 | .82920 | .82945 | .82971 | .82996 | .83021 | .83046 | .83072 | .83097 | .83122 95 
.96 | .83147 | .83172 | .83198 | .83223 | .83248 | .83273 | .83298 | .83323 | .83348 | .83373 -96 
.97 | .83398 | .83423 | .83447 | .83472 | .83497 | .83522 | .83547 | .83572 | .83596 | .83621 97 
.98 | .83646 | .83670 | .83605 | .83720 | .83744 | .83769 | .83793 | .83818 | .83842 | .83867 98 
-99 | .83801 | .83916 | .83940 .83965; .83989 | .84013 | .84038 | .84062 | .84086 | .84110 99 
0 1 2 3 4 5 6 7 8 9 








TABLE 5—Continued 

















AEST cacasle alee 











ANTINORMITS 
Thousandths 
v 
0 1 2 3 4 5 6 7 8 9 
1.00 .84134 | .84159 | .84183 | .84207 | .84231 | .84255 | .84279 | .84303 | .84327 | .84351 1.00 
1.01 .84375 | .84399 | .84423 | .84447 | .84471 | .84495 | .84519 | .84542 | .84566 | .84590 1.01 
1.02 .84614 | .84637 | .84661 | .84685 | .84708 | .84732 | .84755 | .84779 | .84802 | .84826 1.02 
1.03 .84850 | .84873 | .84896 | .84920 | .84943 | .84967 | .84990 | .85013 | .85037 | .85060 1.03 
1.04 .85083 | .85106 | .85129 | .85153 | .85176 | .85199 | .85222 | .85245 | .85268 | .85291 1.04 
1.05 .85314 | .85337 | .85360 | .85383 | .85406 | .85429 | .85452 | .85474 | .85497 | .85520 1.05 
1.06 .85543 | .85566 | .85588 | .85611 | .85634 | .85656 | .85679 | .85701 85724 | .85747 1.06 
1.07 .85769 | .85792 | .85814 | .85836 | .85859 | .85881 | .85904 | .85926 | .85948 | .85971 1.07 
1.08 .85993 | .86015 | .86037 | .86060 | .86082 | .86104 | .86126 | .86148 | .86170 | .86192 1.08 
1.09 .86214 | .86236 | .86258 | .86280 | .86302 | .86324 | .86346 | .86368 | .86390 | .86412 1.09 
1.10 - 86433 | .86455 86477 | .86499 | .86520 | .86542 | .86564 | .86585 86607 | .86629 1.10 
1.11 .86650 | .86672 | .86693 | .86715 | .86736 | .86758 | .86779 | .86800 | .86822 | .86843 3.0 
1.12 .86864 | .86886 | .86907 | .86928 | .86949 | .86971 | .86992 | .87013 87034 | .87055 1,12 
1.13 .87076 | .87097 87118 | .87139 | .87160 | .87181 | .87202 | .87223 87244 | .87265 1.13 
1.14 .87286 | .87307 | .87327 | .87348 | .87369 | .87390 | .87410 | .87431 | .87452 | .87472 1.14 
1.15 .87493 | .87513 | .87534 | .87555 | .87575 | .87596 | .87616 | .87636 | .87657 | .87677 1.15 
1.16 .87698 | .87718 | .87738 | .87759 | .87779 | .87799 | .87819 | .87840 | .87860 | .87880 1.16 
1.17 .87900 | .87920 | .87940 | .87960 | .87980 | .88000 | .88020 | .88040 | .88060 | .88080 1.17 
1.18 .88100 | .88120 | .88140 | .88160 | .88179 | .88199 | .88219 | .88239 | .88258 | .88278 1.18 
1.19 .88298 | .88317 | .88337 | .88357 | .88376 | .88396 | .88415 | .88435 | .88454 | .88474 1.19 
1.20 . 88493 | .88512 88532 | .88551 | .88571 | .88590 | .88609 | .88628 | .88648 | .88667 1.20 
1.21 .88686 | .88705 | .88724 | .88744 | .88763 | .88782 | .88801 | .88820 | .88839 | .88858 1.21 
1,22 .88877 | .88896 | .88915 | .88934 | .88952 | .88971 | .88990 | .89009 | .89028 | .89046 1.22 
1.23 . 89065 | .89084 89103 89121 | .89140 | .89158 | .89177 | .89196 | .89214 89233 1.23 
1.24 89251 | .89270 | .89288 | .89307 | .89325 | .89343 | .89362 89380 | .89398 | .89417 1.24 
1.25 89435 | .89453 89472 89490 | .89508 | .89526 | .89544 89562 89580 | .89599 1.25 
1.26 89617 | .89635 | .89653 89671 | .89689 | .89706 | .89724 | .89742 | .89760 | .89778 1.26 
1.27 89796 | .89814 | .89831 | .89849 | .89867 | .89885 | .89902 | .89920 | .89938 | .89955 1.27 
1.28 89973 | .89990 | .90008 | .90025 | .90043 | .90060 | .90078 | .90095 | .90113 | .90130 1.28 
1.29 90147 | .90165 | .90182 | .90199 | .90217 , .90234 | .90251 | .90268 | .90286 | .90303 1.29 
1.30 90320 | .90337 | .90354 | .90371 | .90388 | .90405 | .90422 | .90439 | .90456 | .90473 1.30 
1.31 .90490 | .90507 | .90524 | .90541 | .90558 | .90575 | .90591 | .90608 | .90625 | .90642 1.31 
1.32 .90658 | .90675 | .90692 | .90708 | .90725 | .90741 | .90758 | .90775 | .90791 | .90808 1.32 
1.33 -90824 | .90841 90857 90873 | .90890 | .90906 | .90923 | .90939 | .90955 | .90971 1.33 
1.34 .90988 | .91004 | .91020 | .91036 | .91053 | .91069 | .91085 | .91101 | .91117 | .91133 1.34 
1.35 91149 | .91165 | .91181 91197 | .91213 | .91229 | .91245 | .91261 | .91277 | .91293 1.35 
1.36 91309 | .91324 | .91340 | .91356 | .91372 | .91387 | .91403 | .91419 | .91434 | .91450 1.36 
1.37 91466 | .91481 91497 | .91512 | .91528 | .91543 | .91559 | .91574 | .91590 | .91605 1.37 
1.38 91621 | .91636 | .91651 | .91667 | .91682 | .91697 | .91713 | .91728 | .91743 | .91758 1.38 
1.39 91774 | .91789 | .91804 | .91819 | .91834 | .91849 | .91864 | .91879 | .91894 | .91909 1.39 
1.40 .91924 | .91939 | .91954 | .91969 | .91984 | .91999 | .92014 | .92029 | .92043 | .92058 1.40 
1.41 -92073 | .92088 | .92103 92117 | .92132 | .92147 | .92161 | .92176 | .92190 92205 1.41 
1.42 92220 | .92234 92249 | .92263 | .92278 | .92292 | .92307 | .92321 | .92335 | .92350 1.42 
1.43 92364 | .92379 | .92393 | .92407 | .92421 | .92436 | .92450 | .92464 | .92478 | .92492 1.43 
1.44 92507 | .92521 92535 | .92549 | .92563 | .92577 | .92591 | .92605 | .92619 | .92633 1.44 
1.45 92647 | .92661 92675 | .92689 | .92703 | .92717 | .92730 | .92744 | .92758 | .92772 1.45 
1.46 92786 | .92799 | .92813 | .92827 | .92840 | .92854 | .92868 | .92881 | .92805 | .92908 | 1.46 
1.47 92922 | .92935 | .92949 | .92662 | .92976 | .92989 | .93003 | .93016 | .93030 | .93043 1.47 
1.48 .93056 | .93070 | .93083 | .93096 | .93110 | .93123 | .93136 | .93149 | .93162 | .93176 1.48 
1.49 93189 | .93202 | .93215 | .93228 | .93241 | .93254 | .93267 | .93280 | .93293 | .93306 | 1.49 
0 1 2 3 4 5 6 7 8 9 






































TABLE 5—Continued 


























ANTINORMITS 
Thousandths 
i. 3 v v 
0 1 2 3 4 5 6 7 8 9 
oe g 1.50 | .93319 | .93332 | .93345 | .93358 | .93371 | .93384 | .93397 | .93409 | .93422 | .93435 | 1.50 
01 Z 1.51 -93448 | .93461 | .93473 | .93486 | .93499 | .93511 | .93524 | .93537 | .93549 | .93562 1.51 
02 d 1.52 .93574 | .93587 | .93600 | .93612 | .93625 | .93637 | .93650 | .93662 | .93674 | .93687 | 1.52 
; 03 1.53 -93699 | .93712 | .93724 | .93736 | .93749 | .93761 | .93773 | .93785 | .93798 | .93810 | 1.53 
| 04 | 1.54 -93822 | .93834 | .93846 | .93858 | .93871 | .93883 | .93895 | .93907 | .93919 | .93931 1.54 
4 
05 : 1.55 -93943 | .93955 | .93967 | .93979 | .93991 | .94003 | .94015 | .94027 | .94038 | .94050 | 1.55 
06 ; 1.56 -94062 | .94074 | .94086 | .94097 | .94109 | .94121 | .94133 | .94144 | .94156 | .94168 | 1.56 
| 07 ; 1.57 -94179 | .94191 | .94202 | .94214 | .94226 | .94237 | .94249 | .94260 | .94272 | .94283 1.57 
08 1.58 -94295 | .94306 | .94318 | .94329 | .94340 | .94352 | .94363 | .94374 | .94386 | .94397 | 1.58 
09 1.59 -94408 | .94420 | .94431 | .94442 | .94453 | .94464 | .94476 | .94487 | .94498 | .94509 1.59 
10 1.60 .94520 | .94531 | .94542 | .94553 | .94564 | .94575 | .94586 | .94597 | .94608 | .94619 1.60 
i 1.61 -94630 | .94641 | .94652 | .94663 | .94674 | .94684 | .94695 | .94706 | .94717 | .94728 1.61 
12 1.62 -94738 | .94749 | .$4760 94771 94781 94792 | .94803 94813 | .94824 | .94834 1.62 
13 1.63 -94845 | .94856 | .94866 | .94877 | .94887 | .94898 | .94908 | .94919 | .94929 | .94939 1.63 
14 1.64 -94950 | .94960 | .94971 | .94981 | .94991 | .95002 | .95012 | .95022 | .95032 | .95043 1.64 
15 1.65 .95053 | .95063 | .95073 | .95083 | .95094 | .95104 | .95114 | .95124 | .95134 | .95144 1.65 
16 1.66 .95154 | .95164 | .95174 | .95184 | .95184 | .95204 | .95214 | .95224 | .95234 | .95244 1.66 
17 1.67 .95254 | .95264 | .95274 | .95284 | .95293 | .95303 | .95313 | .95323 | .95333 | .95342 1.67 
18 1.68 -95352 | .95362 | .95372 | .95381 | .95391 | .95401 | .95410 | .95420 | .95429 | .95439 1.68 
19 1.69 -95449 | .95458 | .95468 | .95477 | .95487 | .95496 | .95506 | .95515 | .95525 | .95534 1.69 
0 1.70 .95543 | .95553 | .95562 | .95572 | .95581 | .95590 | .95600 | .95609 | .95618 | .95627 1.70 
1 1.71 .95637 | .95646 | .95655 | .95664 | .95674 | .95683 | .95692 | .95701 | .95710 | .95719 1.71 
99 1.72 | .95728 | .95737 | .95747 | .95756 | .95765 | .95774 | .95783 | .95792 | .95801 | .95810 | 1.72 
03 1.73 .95819 | .95827 | .95836 | .95845 | .05854 | .95863 | .95872 | .95881 95889 | .95898 1.73 
4 1.74 -95907 | .95916 | .95925 | .95933 | .95942 | .95951 | .95959 | .95968 | .95977 | .95985 1.74 
05 1.75 -95994 | .96003 | .96011 | .96020 | .96028 | .96037 | .96046 | .96054 | .96063 | .96071 1.75 
6 1.76 -96080 | .96088 | .96097 | .96105 | .96113 | .96122 | .96130 | .96139 | .96147 | .96155 1.76 
7 1.77 .96164 | .96172 | .96180 | .96189 | .96197 | .96205 | .96213 | .96222 | .96230 | .96238 1.77 
8 1.78 -96246 | .96254 | .96263 | .96271 | .96279 | .96287 | .96295 | .96303 | .96311 | .96319 1.78 
9 1.79 -96327 | .96335 | .96343 | .96351 | .96359 | .96367 | .96375 | .96383 | .96391 | .96399 1.79 
0 1.80 .96407 | .96415 | .96423 | .96431 | .96438 | .96446 | .96454 | .96462 | .96470 | .96477 1.80 
1 1.81 .96485 | .96493 | .96501 | .96508 | .96516 | .96524 | .96532 | .96539 | .96547 | .96554 1.81 
2 1.82 -96562 | .96570 | .96577 | .96585 | .96592 | .96600 | .96608 | .96615 | .96623 | .96630 | 1.82 
3 1.83 .96638 | .96645 | .96652 | .96660 | .96667 | .96675 | .96682 | .96690 | .96697 | .96704 1.83 
4 1.84 .96712 | .96719 | .96726 | .96734 | .96741 | .96748 | .96755 | .96763 | .96770 | .96777 1.84 
5 1.85 . 96784 | .96792 | .96799 | .96806 | .96813 | .96820 | .96827 | .96834 | .96842 | .96849 1.85 
6 1.86 -96856 | .96863 | .96870 | .96877 | .96884 | .96891 96898 | .96905 | .96912 | .96919 | 1.86 
7 1.87 .96926 | .96933 | .96940 | .96947 | .96954 | .96960 | .96967 | .96974 | .96981 | .96988 | 1.87 
8 1.88 .96995 | .97001 | .97008 | .97015 | .97022 | .97029 | .97035 | .97042 | .97049 | .97055 1.88 
| 1.89 | .97062 | .97069 | .97075 | .97082 | .97089 | .97095 | .97102 | .97109 | .97115 | .97122 | 1.89 
0 1.90 -97128 | .97135 | .97141 | .97148 | .97155 | .97161 | .97168 | .97174 | .97180 | .97187 | 1.90 
1 1.91 .97193 | .97200 | .97206 | .97213 | .97219 | .97225 | .97232 | .97238 | .97244 | .97251 1.91 
9 1.92 .97257 | .97263 | .97270 | .97276 | .97282 | .97289 | .97295 | .97301 | .97307 | .97313 1.92 
, 1.93 -97320 | .97326 | .97332 | .97338 | .97344 | .97351 | .97357 | .97363 | .97369 | .97375 1.93 
4 1.94 .97381 | .97387 | .97393 | .97399 | .97405 | .97411 | .97417 | .97423 | .97429 | .97435 | 1.94 
+ 1.95 | .97441 | .97447 | .97453 | .97459 | .97465 | .97471 | .97477 | .97483 | .97489 | .97494 | 1.95 
5 1.96 | .97500 | .97506 | .97512 | .97518 | .97524 | .97529 | .97535 | .97541 | .97547 | .97552 | 1.96 
. 1.97 | .97558 | .97564 | .97570 | .97575 | .97581 | .97587 | .97592 | .97598 | .97604 | .97609 | 1.97 
3 1.98 | .97615 | .97620 | .97626 | .97632 | .97637 | .97643 | .97648 | .97654 | .97659 | .97665 | 1.98 
) 1.99 | .97670 | .97676 | .97681 | .97687 | .97692 | .97698 | .97703 | .97709 | .97714 |. 97720 | 1.99 
0 1 2 3 4 5 6 7 8 9 









































TABLE 5—Continued 
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ANTINORMITS 
Thousandths 
” -_ 
0 1 2 3 4 5 6 7 8 9 

2.00 | .97725 | .97730 | .97786 | .97741 | .97746 | .97752 | .97757 | .97763 | .97768 | .97773 | 2.00 
2.01 | .97778 | .07784 | .97789 | .97794 | .97800 | .97805 | .97810 | .97815 | .97820 | .97826 | 2.01 
2.02 | .97831 | .97836 | .97841 | .97846 | .97851 | .97857 | .97862 | .97867 | .97872 | .97877 | 2.02 
2.03 | .97882 | .97887 | .97892 | .97897 | .97902 | .97907 | .97912 | .97917 | .97923 | .97927 | 2.03 
2.04 | .97982 | .97937 | .97942 | .97947 | .97952 | .97957 | .97962 | .97967 | .97972 | .07977 | 2.04 
2.05 | .97982 | .97987 | .97992 | .97996 | .98001 | .98006 | .98011 | .08016 | .98020 | .98025 | 2.05 
2.06 | .98031 | .98035 | .98040 | .98044 | .98049 | .98054 | .98059 | .98063 | .98068 | .98073 | 2.06 
2.07 | .98077 | .98082 | .98087 | .98091 | .98096 | .98101 | .98105 | .98110 | .98115 | .98119 | 2.07 
2.08 | .98124 | .98128 | .98188 | .98137 | .98142 | .98147 | .98151 | .98156 | .98160 | .98165 | 2.08 
2.09 | .98169 | .98174 | .98178 | .98183 | .98187 | .98191 | .98196 | .98200 | .98205 | .98209 | 2.09 
2.10 | .98214 | .98218 | .98222 | .98227 | .98231 | .98235 | .98240 | .98244 | .98248 | .98253 .10 
2.11 .98257 | .98261 | .98266 | .98270 | .98274 | .98279 | .98283 | .98287 | .98201 | .98295 | 2.11 
2.12 98300 | .98304 | .98308 | .98312 | .98316 | .98321 | .98325 | .98329 | .98333 | .98337 | 2.12 
2.13 | .98341 | .98346 | .98350 | .98354 | .98358 | .08362 | .98366 | .98870 | .98374 | .98378 | 2.13 

14 98382 | .98386 | .98390 | .98394 | .98398 | .98402 | .98406 | .98410 | .98414 | .98418 | 2.14 
2.15 | .98422 | .98426 | .98430 | .98434 | .98438 | .98442 | .98446 | .98450 | .98454 | .98457 | 2.15 
2.16 | .98461 | .98465 | .98469 | .98473 | .98477 | .98481 | .98484 | .98488 | .98492 | .98496 | 2.16 
2.17 | .98500 | .98503 | .98507 | .98511 | .98515 | .98518 | .98522 | .98526 | .98530 | .98533 | 2.17 
2.18 | .98537 | .98541 | .98545 | .98548 | .98552 | .98556 | .98559 | .98563 | .98567 | .98570 | 2.18 
2.19 | .98574 | .98577 | .98581 | .98585 | .98588 | .98592 | .98595 | .98599 | .98603 | .98606 | 2.19 
2.20 | .98610 | .98613 | .98617 | .98620 | .98624 | .98627 | .98631 | .98834 | .98638 | .98641 | 2.20 
2.21 | .98645 | .98648 | .98652 | .98655 | .98659 | .98662 | .98665 | .98669 | .98672 | .98676 | 2.21 
2.22 | .98679 | .98682 | .98686 | .98689 | .98693 | .98696 | .98699 | .98703 | .98706 | .98709 | 2.22 
2.23 | .98713 | .98716 | .98719 | .98723 | .98726 | .98729 | .98732 | .98736 | .98739 | .98742 | 2.22 
2.24 | .98745 | .98749 | .98752 | .98755 | .98758 | .98762 | .98765 | .98768 | .98771 | .98774 | 2.24 
2.25 | .98778 | .98781 | .98784 | .98787 | .98790 | .98793 | .98796 | .98800 | .98803 | .98806 | 2.25 
2.26 | .98809 | .98812 | .98815 | .98818 | .98821 | .98824 | .98827 | .98830 | .98884 | .98837 | 2.26 
2.27 | .98840 | .98843 | .98846 | .98849 | .98852 | .98855 | .98858 | .98861 | .98864 | .98867 | 2.27 
2.28 | .98870 | .98873 | .98876 | .98878 | .98881 | .98884 | .98887 | .98890 | .98893 | .98896 | 2.28 
2.29 | .98899 | .98902 | .98905 | .98908 | .98910 | .98913 | .98916 | .98919 | .98922 | .98925 | 2.29 
2.30 | .98928 | .98930 | .98933 | .98936 | .98939 | .98942 | .98944 | .98947 | .98950 | .98953 | 2.30 
2.31 | .98956 | .98058 | .98961 | .98964 | .98967 | .98969 | .98972 | .98975 | .98978 | .98980 | 2.31 
2.32 | .98983 | .98986 | .98988 | .98991 | .98904 | .98996 | .98999 | .99002 | .99004 | .99007 | 2.32 
2.33 | .99010 | .99012 | .99015 | .99018 | .99020 | .99023 | .99025 | .99028 | .99031 | .99033 | 2.33 
2.34 | .99036 | .99038 | .99041 | .99044 | .99046 | .99049 | .99051 | .99054 | .99056 | .99059 | 2.34 
2.35 | .99061 | .99064 | .99066 | .99069 | .99071 | .99074 | .99076 | .99079 | .99081 | .99084 | 2.35 
2.36 .99086 | .99089 | .99001 | .99094 | .99096 | .99098 | .99101 99103 | .99106 | .99108 | 2.36 
2.37 | .99110 | .99113 | .99115 | .99118 | .99120 | .99122 | .90125 | .99127 | .99130 | .99132 | 2.37 
2.38 | .99134 | .99137 | .99139 | .99141 | .99144 | .99146 | .99148 | .99151 | .99153 | .99155 | 2.38 
2.39 | .99158 | .99160 | .99162 | .99164 | .99167 | .99169 | .99171 | .99174 | .99176 | .99178 | 2.39 
2.40 | .99180 | .99182 | .99185 | .99187 | .99189 | .99191 | .99194 | .99196 | .99198 | .99200 | 2.40 
2.41 .99202 | .99204 | .99207 | .99209 | .99211 | .99213 | .99215 | .99218 | .99220 | .99222 | 2.41 
2.42 | .99224 | .99226 | .99228 | .99230 | .99232 | .99234 | .99237 | .99239 | .99241 | .99243 | 2.42 
2.43 | .99245 | .99247 | .99249 | .99251 | .99253 | .99255 | .99257 | .99260 | .99262 | .99264 | 2.43 
2.44 | .99266 | .99268 | .99270 | .99272 | .99274 | .99276 | .99278 | .99280 | .99282 | .99284 | 2.44 
2.45 | .99286 | .99288 | .99290 | .99292 | .99294 | .99296 | .99208 | .99299 | .99301 | .99303 | 2.45 
2.46 | .99305 | .99307 | .99309 | .99311 | .99313 | .99315 | .99317 | .99319 | .99321 | .99322 | 2.46 
2.47 | .99324 | .99326 | .99328 | .99330 | .99332 | .99334 | .99336 | .99338 | .99339 | .99341 | 2.47 
2.48 | .99343 | .99345 | .99347 | .99349 | .99350 | .99352 | .99354 | .99356 | .00358 | .99359 | 2.48 
2.49 | .99361 | .99363 | .99365 | .99367 | .99368 | .99370 | .99372 | .99374 | .99376 | .99377 | 2.49 

0 1 2 3 4 6 6 7 8 9 
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NUMERICAL ANALYSIS RESEARCH UNPUBLISHED 
STATISTICAL TABLES 


D. Tr1cHROoEW* 
University of California, Los Angeles 


HIs note lists tables connected with probability and statistics which 

were computed at what is now Numerical Analysis Research, 
University of California, Los Angeles and before July 1, 1954 was the 
Institute for Numerical Analysis, National Bureau of Standards. 

Publication of tables is expensive and funds for the publication of 
statistical tables are extremely hard to find. It is therefore likely that 
many of the tables listed below will not be published. Many of them 
are in fact not yet ready for publication, and some are too special ever 
to warrant publication. It seems worthwhile to report the existence of 
these tables and indicate their present state. 

Some of the tables were computed as ends in themselves; others were 
by-products in computations or were computed because the machine 
codes were available from other problems. In general, the tables have 
not been verified beyond the accuracy required for the purpose for 
which each was intended. It seems worthwhile, however, to list tables 
which are not completely verified because they may simplify the check- 
ing procedures if the functions are recomputed. 

The number of digits given in some of the tables may seem excessive 
to many statisticians. The number of digits used is a consequence of 
the fact that most of the tables were computed on the SWAC (Bureau of 
Standards Western Automatic Computer). This machine operates on 
a basic number length of 36 binary digits (10.8 decimal digits) and there 
is no point in using less than an integral multiple of the basic length 
and, in general, the tables retain most of the digits used in the compu- 
tation. 

Publication appears probable for the following three tables and they 
are therefore not listed elsewhere in this note. 

1. “Tables of the bivariate normal distribution function and related 
functions.” Collated by the National Bureau of Standards. The 
introduction is by G. Blanch. 

* The preparation of this table was sponsored (in part) by the Office of Naval Research, USN and 


the School of Aviation Medicine, USAF. The author's present address is: National Cash Register Com- 
pany, Hawthorne, Calif. 
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The functions given are 


L(A, k, r) 
1 * “a 1 f/x? + y? — 2rry 
= aac ax f eDpi- — ) dy 
QrV/1 — rd, k 2 1-—r? 


1 h Az 1 
V(h, \h) = —f ax f exp |- ; (x? + v| dy. 
wd o 0 





2. “Tables of salvo kill probabilities for square targets” Applied 
Mathematics Series 44, National Bureau of Standards. Introduc- 
tion by A. D. Hestenes.! 

This table gives values of the function 


1 © -) 
Pee eerie f f Qs, 1) 
2mo4,04, —o ¥ —w« 


1 po 2 
(n — Yo) |aan 
o2 


where 


Q(¢, n) = 1 — [1 — PrPalf, 0) }'; 


and 


1 a a 1 _ 2 1 —_ 2 
Pt, )==—— ff exe| -> wi Ee ana, 
2ror,oR, alll aay 2 o* 2 o* 


Zz v 
3. “Empirical power functions for nonparametric two-sample tests 
for small samples.” D. Teichroew. Accepted for publication in 
the Annals of Mathematical Statistics. 

This paper gives the empirical frequencies of all possible rankings 
which are obtained when a sample of m from a normal population of 
zero mean and unit variance and a sample of n from a similar popula- 
tion but of mean-é are ranked in order of size, for (m, n) =(8, 2) (3, 3) 
(4, 2) and (4, 3) and various values of 6. 

All the tables listed below exist on punched cards. In addition, two 
tables mentioned in section II, namely, those containing y(p; A) and 
Qi(p), were multilithed and a number of copies have been distributed. 





1 This table has been published since this note was written. 
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It is hoped that these two tables will eventually be published in one 
book. 
The tables are listed under five categories: 
I. Tables associated with the normal distribution. 
II. Tables associated with the Gamma distribution. 
III. Tables associated with the ¢-distribution. 
IV. Tables for selecting samples from certain distributions. 
V. Miscellaneous tables. 


I. TABLES ASSOCIATED WITH THE NORMAL DISTRIBUTION 














Let 
(2) = ee et, F(z) = J soa, 
H(a;0) = f " [F(o Pat, 
va) =f lr@k fl - Fu bayde, 
Bees M)=—— yp J FORO Fe) as, 
N! “ iia inatiiies 
Beet =a J AHO @ Fl Fe) Pid, 
N! 
E(2i, 5 N)= G(i-1, N-j, j-t—1), 


(t--1)'(j-t#-1) (N-7)! 
Gem, n, p)= ff “xuay Fol FO) Pw) -F(@) Pads, 


and 


K(6;a,8) = (8+0 f [F@ + de[t — Fe) Pyar, 
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The range and size of the tables is as follows: 














Decimal 
Function Range Places Accuracy 
Tabulated 

f(z) z= —12.00(.02)12.00 32 227 

[F(zx) ]* z= —12.00(.02)12.00 225 
k=1(1)19 32 

H(z; b) z= —12.00(.02)12.00 32 225 
b=1(1)19 

v(a, b) a, b=1(1)19 32 225 

E(z;, N) jJ=1(1)N 21 219 
N =1(1)22 

E(z;?; N) jJ=101)N 21 219 
N =1(1)21 

E(z;, 2;; N) i,j7=1(1)N 21 219 
N =1(1)20 

K(5; a, B) 6= —3.2(.1)0(.01)6.4 8 8 
a=1(1)9 
6=0(1)4 





II. TABLES ASSOCIATED WITH THE GAMMA DISTRIBUTION 


Two SWAC routines have been developed for computing p(y; A) 
and y(p; A), where 


1 u(p;d) 
;) =— f e—*?—dt. 
p(y; A) TA) Jo 


1. The first computes p when y is given for \ an integer by summing 
the series 


y? y3 ye? 
,A) =1l—e*(1 —+—+...—__}, 
p(y, d) ov tytatst =a 


The routine is also used to compute y for a given p by inverse interpola- 
tion. 
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2. The second routine computes y for a given p by the asymptotic 
series (using terms up to and including Qn) 


Q3(p) 


c 


+ --- 





y(p;A) = c? + Qi(p)e + Qr(p) + 


where \ =c?. The form of the Q functions was given by Campbell (1923). 
These routines have been used to compute the following tables: 











Decimal 
Function Range Places Accuracy 
Tabulated 

p(y; A) y =0(.5) varying, <64.0 10 5-10 
A=2(1) 20 

y(p; d) p = .000(.001) .999 S 4-8 
A=2(1)15, 20(10)50, 100 

2y(p; A) p = .000(.001) .999 7 4-7 

2d = 3(1)5(2)29 
Q;(p) p = .500(.001) .999 8 8 


#=1(1)11 





III. TABLES ASSOCIATED WITH THE ¢ DISTRIBUTION 


The tables are concerned with solving the equation 


(" + ) 
r 

2 tnm) dr 

das ~) f. ( vw 











ant | — 1+— 
—_ (5 ae 


for t(p; n) when p and v are given. An approximation to é(p; n) is ob- 
tained by summing the first 8 terms of the asymptotic series 





Hp) 
(p;n) = 2+ se 2 


n? 


The first four H functions were determined by Hotelling and Frankel 
[2]. Their method was used to get Hs, He, Hz and Hs. 
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The tables available are the following: 











Decimal 
Function Range Places Accuracy 
Tabulated 
H;(p) p = .500(.001) .999 8 8 
4=1(1)8 
t(p; n) p= .500(.001)999 8 2-8 


n =2(1) 16, 20, 25, 50, 100, 200 





IV. TABLES FOR SELECTING SAMPLES FROM CERTAIN DISTRIBUTIONS 


One method of selecting samples from distributions consists of the 
following stesps: 

1. Compute, 0(k), the sum of k variates uniformly distributed on 
(0, 1). 

2. Let 


z= afi, 
i=0 

The a; are chosen so that zx has the required distribution. (This method 
is developed in Teichroew (1953)). These coefficients have been com- 
puted for the case where k=8 and for the following distributions: 

1. The normal distribution 

2. t/+/n; where t has a normal distribution, for n =50(1)200 

3. The Gamma distribution, for \=2(1) 15, 20, 25, 50, 100. 

4. 1/+/y where y has a Gamma distribution for \=2(1) 15, 20, 25, 

50, 100. 

The variates t/+/n and 1/+/y have been used to generate random values 
of the inverses of Wishart matrices. (If y has a Gamma distribution 
with parameter A, then 2y has a Chi-Square distribution with 2d 
degrees of freedom.) 


V. MISCELLANEOUS TABLES 


1. Multinomial coefficients. 
This table gives the function 


(m+n+p +2)! 


min!p! 





for all combinations of m, n, and p such that m+n+pS 18. The func- 
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tion occurs in the expression for the covariances of order statistics. 


2. Coefficients for curve fittings by Chebyshev polynomials. 
This table gives 10 decimal digit values of 





kr 
cos — (2a + 1) 
2n 


for n=2(2) 12, 16, 18, 20, 25, 30, 40; K=1(1) n—1, a=O0(1) n—1. 
This function appears in the curve fitting method described, for exam- 
ple, in Tables of Chebyshev Polynomials S,(x) and C,(x), National 
Bureau of Standards, Applied Mathematics Series 9. Introduction 
by Cornelius Lanczos. 


3. Tables for Probit Analysis with Poisson Error Models. 
Three functions have been tabulated: 























> e~*"h4 a+B8 logio d et /2 
ae a em Vie 
© g-hhd e—1/2(at8 logio a)? 
Z(a, B;h) = 
(a, B; h) 2 d! Jae 
> —hpd e—1/2(at+8 logio a)? 
T(a; B;h) = logio d 
(a; B; h) ah (logio d)  - 


for h=1(1) 17, B=0(.5) 10, a= —5(.5) 5. The accuracy varies from 3 
to 8 decimal places. 
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A REFINEMENT IN THE USE OF MARK-SENSE CARDS 
FOR TEST RESEARCH 


VALENTINE APPEL 
Nowland and Company, Greenwich, Conn. 
AND 
Guorce CoorER 
Statistical Tabulating Company, New York, N.Y. 


MAJOR deterrent to the wider use of punched card methods in the 
A analysis of psychological test data is the time-consuming and 
costly procedure which is necessary in order to punch the item re- 
sponses into the cards. For this reason, most researchers have preferred 
to use the IBM test scoring machine for purposes of test scoring and 
item analysis. 

The use of the test scoring machine is limited, however, whenever 
the statistical design becomes complex, or the sample size becomes 
especially large. This limitation derives from the necessity to hand 
sort all test papers, and subsequently to feed them one by one into 
the scoring machine. The speed at which analyses can be accomplished 
is, therefore, limited by the rapidity with which an operator can 
manually shuffle the test papers and then feed them into the machine. 

Punched card procedures do not suffer from this limitation, and 
recently techniques and devices have been developed which obviate 
the necessity for the time consuming key punching procedure. One of 
these devices is the Document-to-Card Punch which has been devel- 
oped by the Personnel Research Branch of the Adjutant General’s 
Office. “This device consists of several components—first, a test scoring 
machine chassis containing a sensing unit and plug board. The IBM 
answer sheet is fed into the hopper of this machine. Then a tabulating 
card, with punched holes corresponding to the item responses, is pre- 
pared by the second component, which resembles a modified IBM 
reproducer. Unique identifying information for each answer sheet is 
transcribed into the punched card concurrently by means of the third 
component, a manually operated keyboard” [3, p. 155]. Unfortunately, 
this machine is not generally available for non-government work. 

Several attempts have been made to adapt mark-sense cards as 
examination answer sheets in order to permit the examinees to record 
their responses directly on to a punch card. One of these applications 
was described in an article by Gage and Remmers [2] in which mark- 
sense cards were used in a self-administered student opinion poll, and 
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another was mentioned in a recent article by Staugas [5] in which he 
described a method for scoring tests from punched cards. The present 
article describes another application of the use of mark-sense cards in 
test research. This application outlines a procedure which is not limited 
by the 27 column capacity of the standard mark-sense card, and which 
incorporates a machine checking procedure which permits the editing 
of multiple-coded columns. 

The study to be reported here involved the administration of a 432 
item True-False questionnaire to a group of 500 college students. Be- 
cause considerable item intercorrelation was anticipated, it was re- 
garded as preferable to have the responses punched into IBM cards for 
later item analysis. In order to save the time and cost of manually 
key punching and verifying these questionnaires, and because the 
group to be tested was an intelligent one, it was decided to use 
mark-sense cards and to allow the examinees to record their responses 
directly. 

Three cards were used for each examinee as follows: One box of 
standard IBM mark-sense stock was overprinted as shown in Figure 1.! 
Three decks of 500 overprinted cards were then separately punched 
with consecutive numbers (001-500) using a sequence numbered deck 
in the reproducing punch. At the same time each deck was gang- 
punched with the letter S, M, or V, which corresponded to a similarly 
labeled section of 144 items in the questionnaire. The three decks were 
interpreted and then merged on the collator in sequence 001 S, 001 M, 
001 V, 002 S, 002 M, 002 V, etc., resulting in 500 sets of three cards 
each. At the time of the administration of the questionnaires each 
examinee was supplied with an electrographic pencil and one set of 
three mark-sense cards, on the backs of which the examinees recorded 
their names and other pertinent background data. 

Because of possible inconsistencies in the examinees’ marking of the 
cards, and because of possible machine errors in converting the elec- 
trographic pencil marks into punches, it was essential that a procedure 
be established whereby such discrepancies could be isolated and cor- 
rected. Ordinarily this can be done automatically by means of the 
IBM mark-sense reproducer’s multiple punch and blank column de- 
tector, which rejects all cards having more or less than one punch in a 
column. But because of the fact that six items were coded into each 
column, this procedure was not possible. For this reason, after all the 
questionnaires were administered, the three decks of cards containing 





1 The writers wish to express their appreciation to Mr. Herman Greenblatt, formerly of Richardson, 
Bellows, Henry & Co., who was responsible for the card layout and the overprinting. 
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the mark-sensed item responses were passed several times through the 
IBM reproducer in order to convert the electrographic pencil marks 
into punches. The IBM Electronic Statistical Machine, Type 101, was 
then wired so as to reject any card which did not conform to the follow- 
ing pattern of six punches in each of the 24 marked columns: X or Y, 
0 or 1, 2 or 3, 4 or 5, 6 or 7, and 8 or 9. As each card was rejected it 
was edited and the procedure was iterated until all of the cards con- 
formed to the prescribed pattern. Although actual records were not 
kept at the time, roughly between 10 per cent and 20 per cent of the 
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Fig. 1. Example of a Mark-Sense IBM Card Overprinted 
as an Answer Sheet. 


1500 cards required some editing in order to make them conform to 
the prescribed pattern. Most of these editorial changes resulted from 
occasional item omissions which were arbitrarily coded by the editor 
as “false.” Of the total number of examinees to whom the question- 
naire was administered seven cases had to be eliminated from the 
sample because of failure to follow instructions. 

Once all the cards were properly edited, they were sorted into the 
examinee’s code number sequence, each of the three decks separately. 
The first deck was reproduced into regular IBM card stock. The second 
and third decks were then successively reproduced into this new card, 
each time comparing on the examinee’s code number to make certain 
that each examinee’s three mark-sense cards had been properly inte- 
grated into the one new card. In this way, one punched card was 
created for each examinee, each card having 432 True-False question- 
naire items punched into 72 columns, six items to a column. The re- 
maining eight columns were used for the examinee’s identification 
code number and other pertinent information. This deck of master 
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cards was then employed for the purpose of item analysis, and for 
later scoring and statistical analysis. 

Although the study reported here employed a card format which only 
permitted six two-alternative items to each card column, the method 
is by no means limited to this. Mark-sense cards may easily be over- 
printed in any way desirable, and any or all of the twelve marking 
positions in each column may be employed. While this procedure is 
not’applicable to all test research problems, there are many situations 
in which the use of a mark-sense card in place of an answer sheet can 
be most economical of time and effort. Situations in which the method 
may be used to its greatest advantage are those in which the group to 
be tested is fairly cooperative and intelligent, in which the sample size 
is large, and in which complex item analyses are to be performed. 
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On the Accuracy of Short-Term Entrepreneurial Expectations. Oskar ANDERSON Jr., Ratnatp K, 

Baver, and EsBERHARD FELs. 

During the past four years, the Ifo Institute for Economic Research in Munich, Germany, has 
gathered monthly information on anticipated and actual changes of economic microdata as seen by 
businessmen. The data are not numerical but indicate trends only. 

In this study, the data of 104 producers of outer garments, 110 textile manufacturers, 77 textile 
wholesalers, and 442 textile retailers have been evaluated with a view to the accuracy of their predictions 
as compared with their ensuing reports on actual changes. The results of this micro-approach are later 
compared with the findings as they are usually represented in macro-form in the so-called Business 
Mirrors of the Ifo Institute. 

First it is found that industrialists appear to have one-month prediction horizons in mind, although 
the pertinent questions refer to two-month intervals. 

Second, the percentage of businessmen’s correct estimates increases, once a persistent, uniform 
trend is under way. Otherwise, entrepreneurs have a seeming predilection for giving indifferent expecta- 
tions. For prices, however, they are likely to exaggerate forthcoming upward movements and minimize 
downward movements in their predictions; in stating actual changes, the reverse holds true. 

All findings are subjected to detailed investigations with regard to the various sub-branches and 
the sizes of the firms. The problem of dealing with homogeneous sub-sets of data is also treated. System- 
atic errors in entrepreneurial reporting habits are investigated and a suitable graphical form is given for 
illustrating the decisive “indifference intervals.” 

Finally, it is found that accurate-prediction frequencies are far better when regarded individually 
than when treated summarily in aggregates. 


A Graduate Course in Basic Statistical Analysis for Majors or Minors in Statistics. R. L. ANDERSON, 


This paper discusses a two-semester course in basic statistical analysis, taught for the first time at 
the University of North Carolina last year. The course was designed to parallel a similar course in basic 
statistical theory. Most of those students who took both courses seemed to do very well; however, 
students taking only the analysis course without a previous course in theory were severely handicapped. 
The course was divided into three parts: descriptive methods, sampling experiments, and analysis 
of data and design of experiments and surveys. The first part took too long and did not interest the 
students. Improved descriptive tools are needed, and better methods of teaching them, at least better 
than were used in this first attempt. 
The sampling experiments were designed to present the basic ideas of statistical inference by 
drawing samples from known populations. For many of the students, the tedium of computing tended 
to obscure the main objectives of the study. More time needs to be spent in preliminary lectures and on 
complete sampling from small populations to demonstrate the meaning of expectations. We found that 
students were able to utilize the results of large scale sampling outside of class, e.g., IBM sampling. 
After some revision of the teaching procedure, I feel that the use of empirical sampling offers real 
promise in presenting the ideas of statistical inference. It is important to emphasize the need for this 
when theory is not available. 
Even though the analysis and design part of the course was curtailed, the results seemed to be 
highly satisfactory. It would be desirable to have some impartial observer conduct an examination on 
analysis and design to find out if the students learned general principles or only those procedures 
mentioned by the teacher. 
In order to have time to present the important methods of collecting and analyzing data, the 
following changes are suggested: ‘ 
(1) Cut down the time devoted to the introduction by distributing mimeographed materials and 
requiring outside reading. 

(2) Condense the descriptive materials. 

(3) Have results of large scale sampling experiments available before the class work begins. 

(4) Introduce empirical sampling by drawing all possible samples from smal! populations. 

(5) Be sure lectures precede the sampling, so that students have a preview of the purposes of em- 
pirical sampling. 

(6) Schedule 3 lecture hours and one three-hour supervised laboratory each week. Emphasize that 
gome non-supervised laboratory work will also be expected. 


One final comment: If an analysis course is designed to parallel a similar course in theory, all 
students should either be taking the theory course or have had it already. Otherwise, it is imperative 
that a certain amount of theory be taught in the analysis course. 
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The Criticism of Transformations. F. J. ANscomBe and Joun W, Tukey, University of Cambridge and 
Princeton University. 


The classical methods of analysis of variance and regression are flexible, relatively easy to apply, 
and far more widely familiar than any others. When they do not apply directly and it is possible to 
apply them wisely to transformed responses, it is almost always better to bend the data to the analysis 
than to bend the analysis to the data. Purposes of transforming data are: (1) to increase the additivity 
or simplicity of treatment effects (generally the most important purpose, affecting the quality of in- 
ferences to be made from the data), (2) to make the variance more constant, and (3) to reduce non- 
normality. Two general types of criticism of transformations on the basis of experimental data are: (a) 
the running check, experiment by experiment, to detect cases of extreme non-additivity, non-constancy 
of variance and non-normality, and (b) the serious study of a particular field of experimentation to de- 
termine what transformations are most reasonable for routine use in that field. For (6), the data froma 
number of experiments will need to be analysed according to many (perhaps 5, 10 or 20) transforma- 
tions. Families of transformations form spaces with structure determined by their statistical properties, 
The conclusion of study (b) will be to specify a confidence region in the space of transformations for the 
satisfactory transformation. 

The procedure suggested for testing the adequacy of any transformation is essentially as follows 
(but many variations of detail are possible). Corresponding to each transformed observation y calculate 
the fitted value Y given by least squares and the residual z=y — Y. Plot the z’s against the Y's. Remov- 
able non-additivity appears as a curved regression, removable non-constancy of variance appears as a 
wedge-shaped outline, and extreme non-normality is reflected in non-normality of the marginal distribu- 
tion of the z’s. If it is desired to supplement this graphical analysis by calculated tests or estimates, the 
following statistics may be used: 


(i) Dz¥2, (ii) Dey, iii) Da, — (iv) Das, 


where (i) is for non-additivity (see Tukey, Biometrics, 5 (1949), 232-42), (ii) is for non-constant vraiance, 
and (iii) and (iv) are for skewness and kurtosis. 


Statistics and Planning Educational Operations. C. M. Anmstrone, New York State Education Dept. 

The statistics that should be available for administration in educational institutions are inadequate. 
In some cases the inadequacy is the result of lack of any records at all and in other cages it is the result 
of poorly designed or poorly executed reporting plans. 

Some of the areas in which statistics are most essential are (1) determining the size and location of 
school buildings, (2) planning the organization of the teaching force, (3) determining the teacher de- 
mand and supply, (4) planning for the guidance of students, (5) planning the curriculum and (6) estab- 
lishing and maintaining standards. 

Areas particularly needing improvement are, (1), (3), (4), and (6). 

An important improvement is to reconcile and improve school census data and United States 
Census data so that they can be used inter-changeably. The present inconsistencies are confusing to 
those trying to use the data. 

In the areas of child guidance and establishing and maintaining standards, a great deal of research 
work is needed to devise techniques for processing data on child characteristics and growth so that the 
actual changes in the school children can be used more effectively as guides to administrative action. 
With more attention to these statistics there is a strong possibility that a system of controls for school 
processes might be established that would correspond with the statistical quality control techniques 
used in manufacturing. 


The Development of Census Tracts in the United States. C. E. Barscnexet, Census Bureau. 

The first census tracts were established in eight of the larger cities of the United States in connection 
with the Census of 1910. Since that time the census tract program has developed at an accelerated rate 
until now there are tracts in 142 of the 238 cities of 50,000 or more population. In the tracted areas there 
are 14,500 tracts, accounting for a population of 62 million, or 41.1% of the total population of the 
United States. 

Census tracts are established by interested local groups on the basis of principles defined by the 
Bureau of the Census. In the 1910, 1920, and 1930 Censuses, the tracted cities had to meet the expense 
of the tract tabulations; in the 1940 and 1950 censuses the Bureau of the Census compiled and pub- 
lished population and housing data by census tracts as part of the regular reports. 

It was in 1930 that Howard Whipple Green of Cleveland first convinced the Bureau of the Census 
that the areas adjacent to the central tracted cities also should be tracted. By the 1940 Census, 25 cities 
had established tracts in their adjacent suburban areas and the Bureau of the Census decided that it 
would be desirable to extend the tract program to include the metropolitan districts of all of the tracted 
cities. In the 1950 Census, standard metropolitan areas, consisting of whole counties, were adopted for 
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the presentation of statistics in lieu of metropolitan districts, and the tracting of the standard metro- 
politan areas was adopted by the Bureau of the Census as a desirable goal. At present, there are 172 
standard metropolitan areas of which 47 are tracted in their entirety and 42 have the central city or the 
city and part of the adjacent area tracted. 

The Bureau of the Census has just completed an annotated bibliography, Census Tract Publications 
Since 1950, compiled from information received from key persons in the tracted areas. The bibliography 
demonstrates the extensive and diverse uses which are made of census tracts in marketing analysis, 
demographic, public !.ealth, and social programs, and planning and research activities. It is expected 
that the bibliography will further stimulate the use of census tracts in the localities in which tracts have 
been established and will show other cities the advantages to be derived from participation in the 
Census Tract program. 


Estimation by Least Squares and by Maximum Likelihood. JosrrH Berkson, Mayo Clinic. 

The situation dealt with is given by: Pj =F(z;, a, 8) =F(Y;) (1), where P; is the probability of an 
event corresponding to 2, a and B are parameters to be estimated, Yj =a +z; (2) is the linear transform 
of (1), pi =F (zi, 2) (3), and yi =a atpzi (4), are estimates of (1) and (2) respectively. 

A class of least squares estimates is defined by minimization of (a) Dwi (pi- pi)? or (b) LW; (ui—vi)®, 
where pj =1 —q; is an observed relative frequency corresponding to P; based on n; observations at 2;, yi 
is the linear transform value corresponding to pi, 1/w; is any asymptotically efficient estimate of the 
variance of pi, 1/W; is any asymptotically efficient estimate of the variance of yj. Estimates falling in this 
class are asymptotically efficient and therefore asymptotically equivalent to the maximum likelihood 
estimate. If F is the logistic function, y; the logit of pj and W; =nipiqi, the estimate obtained is the “mini- 
mum logit X? estimate”; if F is the integrated normal curve, yi the normit (normal deviate) of p;, 
and W, =njzi2/piqi where 2; =dp;/dy;, the estimate is the “minimum normit X? estimate.” 

The minimum logit X? estimate and the minimum normit X? estimate were compared with the 
maximum likelihood estimate for finite samples, in respect of the variance and mean-square-error of 
the estimates. For all situations investigated it was found that these least squares estimates have 
smaller variance and smaller mean square error than the maximum likelihood estimate. 


Measuring the Effect of Unemployment Benefits on the Economy. Marvin K. Bioom, Research Council 
for Economic Security, Chicago, Illinois. 

The unemployment compensation system in the United States was designed to encourage stabiliza- 
tion of employment and to tide workers over limited periods of unemployment. Although there has 
been some dispute on its role in “maintaining purchasing power”, there is increasing interest in the 
system as an “automatic stabilizer”. The system clearly meets three criteria for an “automatic stabilizer” 
for the following reasons: it goes into action automatically; on an annual or quarterly basis since 1940, 
unemployment compensation (benefits less taxes) added to purchasing power during periods of slump 
and subtracted from it during high prosperity. In the first quarter of 1954, state benefits exceeded state 
taxes by more than one-third of a billion dollars. 

It is harder to determine the extent to which unemployment benefits reduce the public’s demand 
for cash in a slump. Annual benefits have never exceeded 2.3 per cent of taxable wages or 0.9 per cent 
of disposable personal income. More important is the proportion of income loss offset by unemployment 
benefits. Four broad classes of benefit-income loss measures are presented. Two classes are based on 
gross wage loss, two on net wage loss. Recent estimates of the proportion of gross wage loss offset by 
benefits range from 18 to 25 per cent. Estimates of the net income loss offset by increases in benefits in 
the 1948-1950 decline, based on quarterly data, range from 5 to 25 per cent. The analysis is extended to 
selected labor-market areas to show the differential impact of income chazges and benefit payments on 
different segments of the population. Here, the relative offset ranged from 3 to 25 per cent. The following 
factors account for differences in the ratios: level of benefits provided; the income base used; method 
of computing additional benefits; use of unadjusted or seasonally-adjusted figures; the relative income 
loss; the different components of the income loss. 

The following concepts are discussed: What constitutes a wage loss for the economy? Is net wage 
loss a better base than gross wage loss? What are the effects of divergent movements in the components 
of income on consumption? What happens to the benefits that unemployed workers receive? The frag- 
mentary data in this area are summarized. Attention is drawn to pilot studies by the U. 8S. Department 
of Labor of the income and expenditures of claimants’ families. A research program is suggested which 
coordinates such studies with continuous work histories of samples of covered workers and beneficiaries. 


Canada and the Outside World. C. D. Buyru, Dominion Bureau of Statistics, Ottawa. 
Inter-relationships between Canada and the United States as shown in the Canadian balance of 

payments are numerous and far reaching. An outstanding difference between Canada and the United 

States is the relatively greater importance of foreign trade in the case of Canada. Foreign transactions 
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make up about one-quarter of Canada’s gross national expenditure compared with around 5 per cent 
in the case of the United States. And the very large and growing ratio of this Canadian trade is with the 
United States, Canada being the best customer of that country as well as the leading source of supply. 
The close connection with United States demand is one of the most direct contacts which the Canadian 
economy has with the United States. Many of the basic Canadian primary industries supply the 
United States. 

There is also an influential chain of inter-relationships arising from the large number of Canadian 
industries which are controlled by United States corporations. These constitute about one-fifth of the 
investment in Canadian industry and commerce and in manufacturing they represent not far from 
one-third of the capital invested, and make for increasing interpenetrations between the United States 
and Canadian business communities. Security markets are also closely inter-related, and there is also 
an unprecedented movement of people across the border. 

But the Canadian economy has a distinctly separate existence to that of the United States. Among 
the factors differentiating Canada from the rest of North America is affiliation with the Commonwealth 
and the relatively greater extent of Canadian relations with the overseas world. The rapid rate of 
growth in Canada’s domestic economy has also provided increased strength and a momentum to 
Canadian activity, but external transactions remain of special importance. 

In relation to the overseas world, Canada is now one of the “have” countries. Canadian participa- 
tion in the recent war and subsequent loans and contributions to overseas countries in the post-war years 
illustrate Canada’s economic strength, and post-war growth has further added greatly to industrial 
development, This new situation contrasts with the position of economic dependency in the early decades 
of the century. While the net foreign indebtedness of Canada has risen since 1950, it is less in real terms 
than before the war and capital from Canadian sources has financed much the largest part of new in- 
vestment. But the existence of deficits in Canada’s current account in recent years arising from large 
expenditures in the United States and the continued debtor position of the country have placed limita- 
tions on the extent to which Canada can provide overseas aid. 

Finally, comment is made on the partly unexplored frontiers of international financial statistics— 
short-term movements of capital. The elusive and complex character of this field has provided a baffling 
problem of measurement for all countries which are concerned with it. 


State Patterns and Speed of the Labor-Force Shift from Agricultural to Nonagricultural Industries in 
the United States, 1870-1950. C. P. Bratnerp, University of Pennsylvania. 


The pattern and rate of the labor-force shift produced by divergent developments in agriculture 
and in industries are analyzed in terms of the “nonag percentage,” or the number of nonagricultural 
workers in each state taken as a percentage of the total labor force in that state, and used as an index 
of economic maturity. The dominant course of development is a strong upward movement with no 
reversals, but three subordinate regional patterns are distinguished within the main one. 

Nonagricultural workers increased their share of the labor force from 48 to 88 per cent in 80 years. 
Using point change (increment) to measure rate of change, southern states changed the fastest, central 
and western states came next, and northeastern states changed most slowly. But measuring rate of 
change in terms of “rate of approach” to a hypothetical maximum of 100 per cent nonag, the industrial 
states of the northeast and central regions moved fastest towards the limit, while southern and mid- 
western farm states moved the slowest. Although the country as a whole made the transition at an even 
pace by either measure, a majority of the states changed faster before 1910 than later, but those that 
changed faster after 1910 made striking gains. Basing the rate of approach on the potential change as 
of 1910, the transition was much accelerated in the second period. The higher a state’s level of maturity 
in 1870, the smaller the net change in the percentage by 1950, but the higher the rate of approach to 
the maximum. Level of start was not, however, a limiting factor until after 1910. 

The shift away from agriculture was not directly connected with population growth, but until 
1910 was definitely associated with immigration and with rate of growth in number of manufacturing 
wage earners. Decline of the agricultural labor force in the east appears first to have retarded the 
transition in the west and later to have maintained it in the east. In the south, transition was slow until 
the number of agricultural workers began to decline. 


Census Data for Units of School Government. Henny M. Brickeii, Manhasset Public Schools, New 
York. 


The basic unit of school government in the United States is the local school district. Because of the 
limitations the typical state has placed upon its own educational authority, educational control and the 
main burden of educational financing fall within the province of the local district. Accordingly, the 
person engaged in educational research must look within each school district for the factors which make 
its schools differ from the schools of other districts. 

A dozen of the questions usually asked by a census enumerator give the key to most of the differ- 
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ences. Adequate follow-up of these leads is being slowed somewhat by the inadequacy of available 
census data, but slowed much more severely by the fact that information is generally not available by 
school districts. 

There are three major factors in the school setting which have arrested our attention. The first is 
the ability of the people to pay school taxes. We need to know the total value of property and income. 
The second major factor is population characteristics. Past study points to nativity, education, occupa- 
tion, and other items of personal background as related to the kind of schools operated in the district. 
The third factor demanding study is the type and intensity of group organization within a community. 

It is not the research man alone who is interested in the census. Those responsible for the day to 
day operation of the schools are year-round users of the kind of data gathered by the censuses of popula- 
tion and housing. Moreover, if census data were available by the governmental units with which school 
officials are concerned, its utility would be multiplied. 

Recommendations: (1) Data collected by the Population and Housing Division should be published 
or made readily available by the local governmental units operating public elementary schools. (2) The 
game summary inforjnation should be published for all districts regardless of size or location so that 
comparisons betweer districts of any size over the nation are possible. (3) In intercensal years local 
school districts or other governmental units should have access to the standard questions, instructions 
to enumerators, and canvassing methods used by the Bureau so that they can gather local data compar- 
able to that collected during the Bureau’s sample censuses and sample surveys. (4) The Bureau should 
encourage interested organizations to advertise and popularize available data. 





Statistical Methods in Meteorology. GLenn W. Brier, U. S. Weather Bureau. 


Several examples are given of techniques devised to overcome some of the statistical problems en- 
countered with weather data or other time series. The first topic discussed is the length of record 
necessary to determine a “normal”, and a simple criterion for making a rational choice is given. Two 
other examples are concerned with the problem of testing the significance of “singularities” in weather. 
A description is given of a sampling experiment with random numbers which formed the basis of a test 
on rainfall data. A non-parametric test is proposed for use in testing the significance of the relationship 
between two time series. 


The Flow of Funds Approach to Savings and Investment. Danie H. Britu. 

This paper is concerned with data requirements for analysis of saving and investment. First, the 
paper deals with inadequacies in the definitions underlying concepts of saving and investment in the 
national income accounts. These inadequacies are (a) differences in scope of activities classified as saving 
from one sector to another, making meaningful comparisons within or among economies very difficult, 
and (b) consolidation of all investment activities into one account, thereby preventing intersector com- 
parisons and precluding analysis of the influence of financial variables, such as debts or liquid assets, in 
saving and investment processes. 

Next, the paper discusses a new system of accounts, now in preparation, which may be of use in 
these problems. The system, known as “flow of funds accounts”, leans heavily on the income and 
product structure for data. The flow of funds system attempts to record all uses of money and credit by 
each of the major sectors of the economy, whether for goods or services, capital or current account, 
financial or non-financial activities. Each sector’s sources and uses of funds are classified in categories 
of activity carried through the system consistently. No concept of savings or investment is identified in 
the accounts; detail on different types of transaction is given to permit combinations of data into various 
formulations as may be required for analysis. 

Since the system measures flows of funds, internal bookkeeping allocations (such as charges to 
reserves or interplant transfers within a single enterprise) are not recorded. Further, it is necessary to 
meagure flows of funds at the values at which the flows occur. Some internal allocations, particularly 
those for depreciation or tax reserves, are important determinants of business investment behavior. 
Differences between book- and market-values may also be important in analyzing such behavior. 

These deficiencies emphasize the need for development of accounting systems which can encompass 
internal transactions as well as intertransactor flows, and book- as well as market-values. Finally, it is 
necessary to investigate much more intensively the relationship between proprietors of unincorporated 
businesses as entrepreneurs and as consumers. It is particularly difficult to separate the personal and 
the business saving of this group, or even to determine whether such separation would be significant 
analytically. 


Recent Advances in Government Statistics. Ropert W. Burcess, Bureau of the Census. 


Most of the advances to which I shall call attention do not concern subject matter, but consist 
rather of: (1) A better understanding of the way in which good government statistics are and can be 
used by government, business and social scientists. (2) The development of more efficient and economical 
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methods including, for instance, greater use of administrative records as raw material, greater use of 
scientific sampling procedures, and employment of electronic computing equipment. (3) Better coordina- 
tion of the statistical results derived by one government agency with related results derived by another 
agency. (4) Prompter collection and publication of results. And (5) More careful definition and computa- 
tion of familiar statistical measures with more attention to establishing margin of sampling variation 
and degree of statistical reliability. 

In addition to these advances of methods and procedures type, the Census Bureau has initiated 
or is initiating advances concerned with subject matter, as for instance: (1) Water usage by manufacturing 
concerns. The 1953 Annual Survey of Matiufactures is incorporating and the 1954 Census of Manu- 
factures will incorporate material on the avnount of water used in a year by manufacturing establish- 
ments. (2) Commodity Flow. Managers of sules and production have long wished they had better infor- 
mation on shipments and stocks at various stages of the flow of each commodity from producer to 
ultimate consumer. The Bureau is exploring, for a limited number of commodities, the feasibility of 
assembling such figures on a useful basis. (3). “Fiz-it” Expenditures.—Collection from sample house- 
holds of monthly expenditures for repairs and alterations, a project intended to fill a conspicuous gap 
in construction statistics. (4) Statistical relations of establishment, company and enterprise. The Censuses 
of Manufactures and Business provide data based on establishments; the Bureau of Internal Revenue 
deals with profits of companies or sometimes, in the case of consolidated returns, of enterprises, some- 
times embracing companies in various different lines of activities. Determination of aggregate sales, 
employment and other statistics of establishments, companies and enterprises on a comparable basis, 
would facilitate an understanding of the structure of our economy and improve present statistics of 
national income and gross national product. I believe that Census Bureau records can provide some 
valuable material bearing on such interrelations and could provide more by relatively simple extensions 
of our inquiries on certain points. 


From Engineering to Engineering Statistics. Irvine W. Burr, Purdue University. 


Large numbers of engineers and industrial personnel have been taking courses in statistical quality 
control. These courses are of several kinds: intensive courses of eight or ten days and part-time courses, 
either for those in one organization or open to all. These may be in colleges or universities, or privately 
sponsored. The trend in training is ever upward, as seen in the growth of the American Society for 
Quality Control to a membership of over 8,000 in about ten years. 

There is a place for every possible gradation between those with a single intensive or part-time 
course, to those with a doctor’s degree in mathematical statistics. Some of the people most useful to 
industry are those with a B.S. in an engineering field, some practical industrial experience and a degree 
or two in mathematical statistics. 

Courses for the engineer should stress applications and experiments, particularly the first course 
and probably also the second. Those with greater mathematical maturity will be ready for and welcome 
considerable theory in subsequent courses. 

The long-run objective is to get statistical training into all engineering curricula, not only in the 
form of courses in statistics, but also throughout the engineering and science courses, so that whenever 
a statistical problem comes up, as in laboratory data or measurement precision, appropriate methods 
will be used. Even when this is accomplished, there will always be need for training those who never 
went to college or even high school, and also need for additional training for those who want more 
statistics. Such training will, of course, only be s, supplement to the person who is studying on “his own” 
as many do. 


In the Statistical Class Room. Invina W. Burr, Purdue University. 


There are three general and interrelated problems in statistics for physical scientists. The first is 
that of selling them on wider use of statistics. The fundamental core is that statistics is the science of 
analyzing problems involving variation. Since physical scientists tend to encounter more and more 
variation as their experiments become more refined, statistics is becoming a basic tool. They must also 
be shown how easily, without proper design in their experimentation, they can reach wrong conclusions 
and waste much time. 

The second problem is that of how to teach physical scientists. Shall the emphasis be on theory and 
derivations, upon experiments to illustrate the theory, or upon applications using physical science data? 
The writer believes that in a first course the emphasis should be upon the latter two, despite the fact 
that most physical scientists have the mathematical background to “take” theory. There is not time for 
all three phases of teaching. Experiments and applications are more likely to foster that precious quality 
“statistical thinking.” This we must have, because the one course may be our only chance. 

The third problem is that of what to teach. Here the aim should be to present significance of dif- 
ferences, interval estimation, analysis of variance, linear correlation, curve-fitting and design of experi- 
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ments as soon as possible, for these are the tools for which the physical scientist can see a use in his 
work. The menu can be rounded out in a second course. 

Throughout the course the aim should be sell, sell, sell! The more the statistician knows of physical 
scientists’ problems the better he can teach them. 


Optimal Filtering as Statistical Decision. A. Gzorce Carton, The Johns Hopkins University. 

The problem of filtering, i.e., estimating present value of a message random process on the basis 
of observation of the past of the input process made up of the sum of the message process and a noise 
process, is considered. The Wiener optimal realizable linear filter is exhibited. Minimax o? realizable 
linear filters are considered. It is shown that with message and noise spectra given, and process distribu- 
tion functions only slightly further restricted, the minimax realizable filter is the optimal realizable filter. 
The minimax realizable filter is derived for cases in which either or both of the spectra are subject only 
to a linear restriction of the form ®(w) =C(w) f(w), where C(w) is a specified function designated the 
spectral capability function and f(w) is any symmetric probability density function. Simple special cases 
are derived explicitly. It is pointed out that in many cases one can design subminimax adaptive filters 
which possess all the merits of Wiener-type optimal linear filters based on spectra estimated from the 
observations and furthermore, adapt reasonably well to nonstationary and non-normal processes, and 
that in some cases these adaptive filters are most naturally constructed in feedback form. 


Change of Composition Estimates of Fish Populations. Doucias G. CHapman, Ozford. 

The theory of the method of population estimation based on the change of composition of the 
population due to a selective removal is studied in some detail. The author has given the maximum 
likelihood estimates earlier (“The estimation of biological populations,” (Annals of Mathematical Statistics 
25, 1954); 1-15] under the assumption of random sampling, so that a binomial model is valid. Some of 
the shortcomings that occur in the practical application of this method are here considered in their 
effect on the estimation procedure. In general the procedure appears to be fairly insensitive to failure of 
the assumptions. The method is compared with the more usual tag sample estimation procedure, in 
terms of the information obtained for the same amount of effort. It appears that the tag sample method 
yields more information but at the same time is more likely to give unsatisfactory results due to failure 
of the underlying assumptions. 

Also considered is an estimation procedure based on a combination of tag sample and change of 
composition methods. In one important case at least the maximum likelihood equations are easily 
solved. Some answers are given to questions of optimum sample design where the binomial model is 
assumed. An estimation procedure is outlined for the case where the sampling is such that the binomial 
model is not acceptable. 


Statistics in Colleges and Universities of the South. H. H. Coapman. 

The Inventory of Instruction, Research and Service in Statistics was sponsored by the Southern 
Regional Education Board. Information was collected largely by questionnaires but was supplemented 
by an examination of college catalogues and by personal visits to institutions. A total of 272 institutions 
in the fourteen states served by the Southern Regional Education Board were invited to participate; 
questionnaires were sent to 209 and replies were received from 193. A preliminary report gives detailed 
information on the courses taught in the several institutions, the laboratory and other facilities avail- 
able, the place that statistics occupies among the elective or required courses in various curricula, pro- 
grams for training statisticians, research and service activities undertaken, and organization for in- 
struction, research and service in statistics. The final report will be published by the Southern Regional 
Education Board. 

The data collected indicate a wide diversity of offerings. The mathematical preparation for the in- 
troductory courses range from no formal requirement to calculus. For the more advanced courses a 
knowledge of calculus is ordinarily required. To a very large extent the introductory courses are given 
in subject matter departments and are designed primarily for students majoring in the given depart- 
ment. Centralized statistics departments were reported in only three institutions although some type of 
centralized control or coordination was reported in several of the other larger institutions. The reports 
indicate that many of the institutions have made substantial progress in providing the library facilities 
and the equipment needed for advanced graduate instruction and for research and consultation service. 

An analysis of the programs in operation and the opinions expressed indicates decided differences 
in thinking concerning the most desirable lines of development. Some are primarily interested in the 
training of statisticians, tend to place the emphasis on statistical theory, and are distressed by the 
numerous offerings of introductory courses by subject matter departments. The majority, however, are 
most directly concerned with the application of statistical methods to the solution of problems in a 
chosen subject-matter field. . 
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Pensions—A Stabilizing Influence on Consumption. Mrrtam Civic. 


The present and potential effects of pension payments on the economic stability of the country 
have received little of the attention due them. This paper is an initial step in measuring the actual and 
relative dimensions of pensions today, and in tracing their growth from a half dozen years ago to a half 
dozen years hence. It highlights the inadequacy of available statistics for appraising the stabilizing 
effects of pensions. 

Pensions from all sources amounted to $7} billion in 1953 compared with less than half this sum in 
1947. The author estimates that by 1960 they will have doubled. Since most pension income is spent, 
it is interesting to note that pensions represented about 2% of total consumption expenditures in 1947 
and a little over 3% in 1953. They will come close to 5% in 1960, under conditions of relatively high 
employment. 

Three-fifths of pension outlays went to persons 65 or older. The rest were paid to younger persons 
on account of disability or survivorship. Expanded pension programs since 1947 have had noticeable 
effects on the sources and amount of income received by the aged, the proportion of older persons “with 
income”, and the number of new households formed by aged individuals. Characteristics of the aged as 
& group, which indicate that pension income will be rapidly spent, include their relatively high net 
worth and home ownership; favorable tax status; small fixed commitments; and low rate of savings. 
Empirical data are badly needed to show how the aged actually spend, or save, their income. 

Inherent in all pension programs is an expansion factor which will be responsive to economic decline, 
It consists of pensionable persons who are able to hold jobs when business conditions are good, but who 
will leave the labor market during a setback. Such a development was observed between January, 1953 
and January, 1954, when 200,000 aged workers who were eligible for social security withdrew from the 
j2bor force to collect benefits. But possible expansion is less important than the certainty that pension 
jncome will at least hold steady, when other types of income decline. 

Some Methods for Strengthening the Common x? Tests. Witt1am G. Cocuran, Johns Hopkins Uni- 
versity. 

Since the x? tests of goodness of fit and of contingency tables are not directed against any particular 
alternatives to the null hypotheses, it is often advisable to replace or supplement them with more spe- 
cific testa. This paper gives a review, with illustrative examples, of some common methods of strengthen- 
ing x? tests in this sense. There may be a substantial increase in power in the x? test itself by the use of 
small expectations at the tails. An example which brings out this point is discussed. 

An alternative to x? that will often detect departure from the null hypotheses is the comparison of 
low moments of the observed and theoretical distributions, e.g., by means of the “variance” test for the 
binomial and Poisson distributions and the tests of skewness and kurtosis by the normal distribution. 
The sum of squares in a variance test may also be broken down to give more specialised tests. Another 
possibility is to test some selected linear function of the deviations between observed and expected num- 
bers in a goodness of fit test. 

Some recent methods for subdividing the degrees of freedom in the x? test for a two-way contingency 
table are also relevant. These may be particularly useful when one or both of the classifications in the 
two-way table is ordered. 

Some methods are presented for handling the fairly common problem of combining results from 4 
number of independent two-way tables. 


Needed Improvements in the Federal Statistical Programs for City Planning Purposes. Henry CoHEn, 

Department of City Planning, New York City. 

The recent proposal by Congressman J. Arthur Younger (Rep. Calif.) to set up a federal Depart- 
ment of Urbiculture or Urban Affairs requires serious consideration. The paper traces the history and 
evolution of our governmental structure, with particular reference to the establishment of major 
Cabinet-rank departments, and shows how slow the country has been in according full recognition to 
major functions. 

City planners are concerned with the physical planning of urban communities containing in excess 
of half of the nation’s population; in which the real estate assets probably equal $400,000,000,000; in 
which probably over $200,000,000,000 of the national income was produced in 1953; and in which over 
$9,000,000,000 was spent in 1953 on residential and non-residential construction. 

The recent report of the Intensive Review Committee, headed by Ralph J. Watkins, recommended 
two quinquennial Censuses of Agriculture at an estimated cost of $46,000,000 per decade. This is con- 
siderably more than the costs of the recommended Censuses of Business and Manufactures, which cover 
mainly urban areas. 

One of the dimensions of urban life which has been largely ignored and neglected in the federal 
statistical programs is the physical and spatial basis. 





of 





1955 


ntry 
and 
half 
zing 


n in 
ent, 
1947 
high 


BOs 
able 
with 
das 

net 
ngs. 


line, 
who 
1953 


sion 


Jni- 


1en- 
e of 


n of 
the 
ion. 
ther 
um- 





SUMMARIES OF PAPERS 573 


The eighteen largest cities in the country, each with half a million people or more, contained a total 
of 26,591,395 people in 1950 compared to a total farm population in the nation of only 25,058,000. The 
urban population concentration of approximately 11,800 persons per square mile is 850 times greater 
than the average farm density of 13.8 persons per square mile. The organization and management of 
activities which are so heavily concentrated in small areas is an undertaking of the greatest difficulty. 
This planning and management task can hardly be handled satisfactorily without detailed data showing 
the characteristics, intensity of concentration, and location of the varied urban activities and functions. 

Business and industrial establishments have (a) structural characteristics, (b) are located in a 
physical environment, and (c) exist in a pattern of spatial relationships to their markete, their supply, 
servicing, and financing sources; transportation, and competing establishments. These physical factors 
and relationships, neglected in the federal economic censuses, are important and measurable. A man 
from Mars studying our society from Census sources would be impressed with the tremendous economic 
activity carried on by millions of people in a spatial setting which apparently has no physical dimen- 
sions. 

One of the other major shortcomings in the federal economic censuses (Business and Manufactures) 
has been the failure to report information for local areas in large cities. Though there are separate re- 
ports for urban places of 10,000 or more population, Manhattan, for example, with 1,954,000 resident 
population and 2,466,000 workers employed on the island, is provided with no separate reporting, as a 
matter of course, for any of its economic districts, e.g., the garment district, the port areas, etc. 


Southern Regional Graduate {ummer Sessions in Statistics. Gentrrupe M. Cox, Institute of Statistics, 

Raleigh, North Carolina. 

The Southern Regional Education Board is a public educational agency created in 1948 by the 
Southern Regional Education Compact among the 14 southern states and is dedicated to the improve- 
ment of graduate and professional education in the region. A general coordinated statistics program is 
being developed in the southern region with the following objectives: (1) to promote the use of efficient 
statistical techniques in all fields of research, (2) to provide assistance in planning surveys and in design- 
ing experiments, (3) to develop a higher quality and greater availability of teaching, research and con- 
sulting service in statistics, and (4) to advance statistics through the discovery of new techniques by 
theoretical investigations. 

At the first conference, the group considered the need of existing statistical personnel for additional 
training, the possibilities of coordinating statistics curricula, the status of consulting services and of 
contract and cooperative research, and the basis for a regional program in statistics. 

An Advisory Commission on Statistics was appointed and met April 19-20, 1953. The Advisory 
Commission made plans to initiate a summer session program. It was proposed that those universities 
having the facilities consolidate their efforts by holding a joint series of summer sessions of six weeks’ 
duration each. The first “Southern Regional Graduate Summer Session in Statistics” was held June 9 to 
July 17, 1954 at Virginia Polytechnic Institute. 

Graduate students made up two-thirds of those with university affiliation. The universities pro- 
vided 80% of the total group, government had 14% and the remaining 6% were from industry. As 
planned, the session was of greatest service to university people. 


Optimum Allocation in Two-Stage Cluster Sampling using Call-Backs on Non-Respondents. Joun E. 

Down, Cornell University. 

A model developed by Deming (Jour. American Stat. Assn., 48, 1953) which allows the calcula:ion 
of the variance of response and bias of non-response when a simple random sample is drawn, is extended 
to the case of two-stage cluster sampling. 

It is assumed that each individual in the population has a certain probability of being interviewed 
successfully, and that for convenience of identification and computation the individuals’ probabilities 
may be grouped into 5 mutually exclusive classes designated by probabilities, #; =1/4 (i =0, 1, 2, 3, 4). 

It is also assumed that the proportion of individuals falling into each response class will vary from 
cluster to cluster; the proportions being in some way related to other cluster characteristics (e.g., 
income, average rent paid, etc.) 

Expressions are then developed for the sample mean, expected value and variance of the sample 
mean, and the bias of non-response, for both the initial attempt and the combination of subsequent call- 
backs with the initial attempt. 

By assuming a cost function arising from various components of travel and interviewing costs, an 
investigation is made as to the optimum manner of allocating resources among clusters in the initial 
and subsequent attempts at obtaining interviews. The allocation of resources is said to be optimum if it 
Produces a minimum mean square error subject to fixed costs. 

The mean square error of the sample mean here refers to the variance of the sample mean plus the 
squared bias of non-response. 
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An expression is also developed for the expected value and variance of the estimate of the population 
mean when the Politz-Simmons method (Jour. American Stat. Assn. 44, 1949) for eliminating call-backs 
is used. 

For a certain range of values of the assumed cost function and certain other population constants 
the relative precision of the call-backs scheme and the Politz-Simmons scheme are compared. 


Factorization of Ethnographic Data. Harotp E. Driver and Karu F. ScHvuess_er. 


Although indexes of interrelationships among ethnographic variables have been arranged in square 
matrices since Boas wrote in 1895, clusterings of variables were determined only by inspection until 
very recently. In 1954 Clements (American Anthropologist, 56, 1954) published the first cluster analysis 
of a set of intercorrelations among cultural groups. 

Factor analysis has not been previously applied to ethnographic data, although it has been in- 
creasingly employed in physical anthropology in the last 15 years. The present employment of the 
centroid method of factorization is experimental. Factor analysis is regarded by some authorities as 
more economical than cluster analysis in that fewer factors are needed. 

Using the same data (on 16 tribes in northwestern California) as that employed by Clements, the 
authors found that a first factor, without rotation, is positive for all tribes and accounts for 76% of the 
total communality. This common influence which appears to dominate all others in the area may be 
called Northwest California Culture. Factor II appears less powerful, the two highest loadings being 
negative and referring to two adjacent tribes in the area transitional between northwest California and 
central California. The highest positive loading is that of the Hupa tribe located in the heart of north- 
west California. We may label this factor Hupa vs. Transitional. The third and fourth factors, by order 
of extraction, have loadings of about the same magnitude as factor II, and are tentatively designated as 
Central California vs. Transitional and Yurok-Karok, and Coast Yuki vs. Hupa-Van Duzen-Sinkyone 1. 

The first two factors are completely in accord with the accepted ethnological picture as outlined by 
Kroeber and others. Northwest California Culture is obviously the dominant one in the area as a whole, 
and the contrast between Northwest and Transitional is equally unquestioned. The distinctiveness of 
Central California in the third factor conforms to preconceptions, but the similarity of Yurok-Karok 
with Transitional in the negative loadings seems strange. Similarly the isolation of Coast Yuki in the 
fourth factor is not too difficult to envisage, but the grouping of Hupa with Van Duzen and Sinkyone 1 
is unorthodox. 

Although the application of factor analysis has not provided a simple “explanation” of the similari- 
ties and differences among the tribes etudied, it has questioned older interpretations and suggested 
lines around which new ones might be formed. 


Social and Economic Characteristics of the Population in the United States Directly Dependent on 

Agriculture. Louis J. Ducorr, U. S. Department cf Agriculture. 

This paper analyzes some results of a special project which collated for a sample of farm-operator 
families and households in the United States information from the 1950 Censuses of Agriculture, Popu- 
lation and Housing. Three categories of degree of dependence on agriculture were used in this analysis 
for farm operator households: (1) the wholly dependent on agriculture, (2) the partly dependent with 
agriculture as the major source of the family’s income and (3) the partly dependent with nonagriculture 
as the major source. Only about two million farm-operator families, or 38 per cent of the total, fell in 
the category “wholly dependent on agriculture.” The proportions of farm operator families in categories 
(2) and (3) were 27 per cent and 30 per cent, respectively. 

It is found that even among “commercial” farms, only half of the operator families were completely 
dependent on agriculture. An additional 35 per cent of the “commercial” farms were partly dependent 
with agriculture as the major source of the farm operator's family income. Among the “noncommercial” 
farms, which make up 29 per cent of all the farms in the United States, 80 per cent had nonagriculture 
as the major source of income. 

The age-sex composition of the population in farm-operator households and the average educa- 
tional level of farm operators are fairly similar among the three categories of degree of dependence on 
agriculture. The nonwhite population of farm-operator households was more than proportionately 
represented among the wholly dependent on agriculture. Fertility ratios were lowest for farm-operator 
families that were mainly dependent on nonagricultural incomes. The occupational composition of the 
employed population in each of the three categories shows sharp contrasts and suggests that the classifi- 
cations used in the analysis achieve with reasonable adequacy an identification of the population wholly 
or primarily dependent on agriculture. 


Simultaneous Confidence Intervals Derived From Multiple Range and Multiple F Tests. D. B. Duncan 
and R. G. Bonner. 


Two new methods are proposed for estimating confidence intervals for comparisons among ” 
means 1, * * * , sin. The first is for differences between single means and is based on the new multiple 
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range test (Duncan, Biometrics, 1955, to be published). The second is for all comparisons of the form 
Dkiwi, where ki, + + * , kn are arbitrary constants such that Dk; =0, and is based on the multiple com- 
parisons test, Duncan (Va. Jour. of Sci., 1951). The new intervals are similar respectively to the “allow- 
ances” proposed by Tukey (1951) and “contrasts” proposed by Scheffé (Biometrika, 1953) but employ 
a new principle. A p-mean joint confidence coefficient is defined for every subset of p-means p=2,***,n 
as the probability of being correct about all intervals involving those p means. A set of intervals with 
confidence coefficient 8, is taken to be a set for which the p-mean joint confidence coefficients are at 
least B?"!, p=2, +++, n, the latter values being termed joint confidence coefficients based on degrees of 
freedom. The use of these special values gives the new methods uniformly shorter intervals than those 
of the comparable procedures and their appropriateness is discussed. Tables for the new methods are 
provided in the references, (Duncan 1951, 1955). 


An Analysis of Agriculture-Manufacturing Differentials in Service Income per Gainful Worker, by 
State, 1889-1890 to 1929=1939. Ricnarp Eastern, University of Pennsylvania. 


This paper is concerned with the relative difference between service income per employee in manu- 
facturing and net income from farming of persons engaged in agriculture in 1889-99 and 1929-39. 
Drawing principaily on census data, an analysis is presented of levels and trends in this income differen- 
tial by state and the bearing of these state income differentials on that for the country as a whole. The 
principal conclusions are as follows. 

ln all states average income in manufacturing exceeded that in agriculture in 1889-99 and 1929-39, 
though usually not by as much as for the country as a whole. The size of the relative difference varied 
considerably among states and regions. A large difference was generally due to a relatively low average 
farm income, not a relatively high average salary-wage. There is a faint suggestion that the size of the 
income differential varied inversely with the degree of economic development of a state, and that 
relatively large changes in the proportion of manufacturing to agricultural employment were associated 
with relatively high income differentials. 

In most states the relative income differential narrowed during the period and in some substantially, 
but the differential for the country as a whole declined only slightly. The contrast between the state and 
countrywide movements is due chiefly to the large weight exercised by the relatively small number of 
states in which the differential narrowed least or actually widened, and the absence of a shift in the 
distribution of agricultural employment toward high farm income states. 


Concepts Employed in Labor Force Measurements and Uses of Labor Force Data. A. Ross Ecker, 
GertTrRuDE Bancrort and Rosert Peart, Bureau of the Census. 


The uses of labor force statistics are so varied that the question of concepts is bound to be a contro- 
versial one. Data are needed for measurement of current changes in economic activity, for general man- 
power analysis, for research into long term trends, for the study of special problem groups, and for many 
other purposes. With the help of various advisory committees, the Census Bureau has tried to meet 
these often conflicting demands by providing detailed information on the present major categories and 
by undertaking experimental work to measure various problem groups whose classification is under dis- 
cussion from time to time. Work has also been done to estimate the effect of certain rules such as the 
exclusion of children under 14 years from the estimates, and the selection of a single calendar week as the 
time reference for the data. 

Much of the di ion of cx pts has centered on the classification of persons as employed who 
did any work during the week, or who had jobs from which they were absent and were not seeking other 
jobs. Special surveys of part-time workers (those with less than 35 hours of work during the week) have 
furnished a measure of part-time employment for economic reasons that has proved a useful supplement 
to the statistics on total unemployment. 

Because job attachments of persons not actually at work are in some cases fairly tenuous, it is 
argued that some categories should be classified as unemployed or not in the labor force rather than as 
employed. A recent study of the duration of absence from jobs has thrown light on the strength of job 
attachments. About 75 per cent of the persons absent from their jobs expected to be back at work 
within 30 days of the start of their absence and therefore could be regarded as having a fairly strong 
claim to a job. On the other hand, among those not working because of illness or bad weather, longer 
absences were more common. These findings have indicated the need for some sharpening of the concept. 

Another problem area has been the identification as unemployed of persons who are on the border 
line of the labor force and who may be incorrectly classified as not in the labor force. A number of 
experiments designed to measure this group have been conducted. They show that perhaps 300,000 to 
500,000 persons, largely housewives and teen-age boys and girls, may be in this category. 

The experimental program has also included efforta to reconcile, the employment data with those 
from establishment reports. Information collected on dual job holding, on unpaid absences, and on 
other factors has proved useful, but has by no means explained all the differences. Apparently concep- 
tual differences are not the only factors causing discrepancies between the two types of data. Sampling 
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and measurement techniques must also be examined. As a result, the authors found that the use of 
statistical methods in industry have progressed very well and farther than believed. There is an interest, 
to the extent that 66% of those not using statistical methods did want a summary of the answers, and 
that 63% of the questionnaires were returned, even in this unsponsored inquiry of a stranger. The kinds 
of use indicate a growing interest in the more complex methods, and, finally, that the staff workers, many 
of whom wrote letters in addition to the formal answers, are mce aware of the needs than their superiors, 
who possibly received their schooling before the teaching of the statistical method became widespread, 


Multiple Regression with Missing Observations among the Independent Variables. G. L. Epaerr, 
Virginia Polytechnic Institute. 


A sample with observations missing is obtained from a trivariate normal population. Equations for 
finding the maximum-likelihood estimators of means, variances and covariances have been obtained, 
Under certain assumptions explicit values of these estimators are obtained. Finally, the problem of 
finding maximum likelihood estimators of the regression coefficients of z: on 2s, Zs, * * * , Zn holding the 
latter constant is discussed. 


Data Allocation in Observing a Quadratic Relation. A. pz na Garza, Carbide and Carbon Chemicals 

Company. 

Data are taken in a pilot plant to study the relation between a quality characteristic y and a 
controlled variable z. It is supposed that y(z) =a+ fr +724 guitably describes the relation, a, 8, and ; 
being unknown constants, and that the observations of y are j;ncorrelated and have equal variance. Due 
to cost restrictions, only N pointe (xj, yi), i =1(1)N, are on wane Due to equipment restrictions, these 
points must be taken in a specified z-range, zz to tz. } 

Let Y(z) =a+bz+cx? be the least squares estimate of y(z), a, b, and c being the least squares 
estimates of a, 8, and y. The paper discussed the problem of how the permissible N observations yj, 
i =1(1)N, should be spaced in the specified range, zz, to zg, for the following purposes: (1) to minimize 
the maximum variance of Y(x) for any z inside the range zz, x77; (2) to minimize the variance of Y(:) 
for a given z =€ outside the range zz, 27; (3) to minimize the variance of the estimate c of . (1) and (2) 
provide optimum interpolation and extrapolation. (3) provides an optimum test for the hypothesis of a 
quadratic relation versus the hypothesis of a linear relation. 

For (1), N/3 observations are located at zz, (rz +zq)/2, and zy. For (2), the N observations are 
distributed at xy, (xz, +2) /2, and zy in proportions depending on the extrapolation point &. For i> ty 
or r<zy,, N/4 observations are located at zz, N/2 at (xz, +2) /2, and N/4 at zy. For (3), N/4 obser- 
vations are located at zz, N/2 at (xz +2q)/2, and N/4 at zy. 

For the interpolation problem, it was shown that the N observations may be reasonably located at 
more than three z-locations with little increase in the maximum variance of Y(z) in the range zz, zy 
Applications to minimizing costs of such experiments were made. 


Migration and Occupational Mobility in a Moderate-Sized Pennsylvania Community. Sipney Go.p- 

STEIN, University of Pennsylvania. 

The changing patterns of migration and occupational mobility in Norristown, Pennsylvania, in each 
decade from 1910 to 1950 have been analyzed through data obtained from the integrated use of city 
directories, vital statistics records, and school records. The importance of migration in changing the size 
and composition of the Norristown labor force has declined markedly since 1910. Whereas in the 1910- 
1920 decade, the net balance of in-migrants over out-migrants was 213 per 1,000 population, by 1940- 
1950 this rate was only 35 per 1,000. In the former decade, almost all occupational groups experienced 
large net gains through migration. By 1940-1950, however, several groups lost as a result of migration, 
and the gains of all others were significantly lower than they were in 1910-1920. 

On the other hand, during the same forty years the importance of occupational mobility in changing 
the labor force composition was increasing. Of the resident male labor force population of the 1910-1920 
decade, 75.9 per cent were in the same occupational group at the beginning and at the end of the decade. 
In the 1940-1950 decade, the proportion of occupationally stable persons was only 65.1 per cent. The 
decrease in stability from 1910-1920 to 1940-1950 characterized all occupational groups except the 
semi-skilled for whom the stability rates in both these periods were approximately the same. The direc- 
tion and range of movement of the mobile segments of the labor force of these two extreme decades has 
not changed significantly. 

The analysis of the data on migration and occupational mobility suggests that these two processes 
serve to complement each other and in so doing, jointly serve to meet the changing needs of the local 
economy and thereby to effect changes in the labor force structure. 
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Introductory Course in Applied Statistics for Students with Limited Training in Mathematics. C. H. 

GOULDEN. 

A limited training in mathematics is defined as consisting of high school mathematics with courses 
in algebra, trigonometry, and analytical geometry, either in a final year at high school or in the first 
year at @ university. Stress is placed on the importance of explaining the role of mathematics in statistics 
and the extent of the mathematical knowledge required by the student. Certain elementary mathemati- 
cal ideas are taught at the beginning of the course, chiefly simple probability, permutations and combina- 
tions, and simple algebraic functions. 

Emphasis is placed on the necessity for concrete rather than abstract thinking throughout the 
course. Examples are given after a thorough drilling on the calculating machine and these are discussed 
in lectures after the student has become acquainted with the problem. The very difficult point of how to 
deal with the idea of continuous frequency distributions is handled by studying the distributions of 
attributes in the laboratory, leading from these by logical argument to the concept of the continuous 
distribution. The logic of statistics is given an important place and particular stress is placed on the 
logic of the test of significance. 

A brief outline of the course is given together with a list of the text books and references required. 


Variance Heterogeneity in a Randomized Block Design. Franxkiin Graysitu, Oklahoma A. and M. 

College. 

Consider the linear model given by 

vig =e tts +b; +e; #=1,2,°°+n;j=1,2,°**m,m>n 

where yij is the observation, u is a general constant, ¢; is the treatment effect, b; is the block effect, and 
¢;jis 3 random error from a normal frequency function with the following properties: (Z denotes mathe- 
matical expectation) (a) ei; =0, (b) eit =o}, (c) E(epjegj) =03,, (d) Eepjegk =0 if j +k. That is to say, 
the variance of the errors may differ from treatment to treatment, correlations of the errors are per- 
mitted within the blocks, and the errors must be independent from block to block. 

In this paper, it is shown that an exact test of significance of the hypothesis ti=t:= °° * =tn can 
be made by using an extension (due to P. L. Hsu) of Hotelling’s 7? test. 


The 2? Factorial in a Latinized Rectangular Lattice Design. Boryp Harsusarcer, Virginia Polytechnic 

Institute. 

Tests in pairs can be useful to evaluate the effects of treatments or combinations of treatments on 
the propellant as they effect the missile in flight and at the same time eliminate the effects of atmospheric 
conditions. Data for such tests will, in general, be secured under several restrictions, such as different 
elevations, different temperatures, and variable atmospheric conditions. One of the chief interests may 
be in linear combinations of the treatments or the so-called factorial effects including the interaction of 
treatments. If the problem is one in which the cost of the tests is important, a very small efficient test is 
desirable. This property of efEciency is usually found in factorials, and it only remains to fit the factorial 
in a design that will have the desired restrictions and adjustments. This paper develops a factorial in a 
Latinized rectangular lattice design so that, (1) the effects of incomplete blocks can be removed from 
the treatment effects, factorial effects, and the interaction of row and replicate effects, (2) estimates 
can be provided for the adjusted treatments, (3) mutually independent estimates can be made for the 
factorial effects, (4) tests of significance can be provided for (a) adjusted treatment effects, (b) adiusted 
independent factorial effects, (c) adjusted interaction of row and replication effects, (d) row effects, and 
(e) replication effects. 


Error Rates and Sample Sizes in Multiple Comparisons. H. Lnon Hanrer, Wright-Patterson Air 

Force Base. 

A study is made of the error rates, a and 8, and their relation to sample size, N, for three test pro- 
cedures that have been proposed for multiple comparisons. These three are the least significant differ- 
ence (LSD) test, Tukey’s studentized range test, and Fisher's test. Each method can be used for setting 
100(1 —a)% confidence limits or for making significance tests at level of significance a. In either case 
one makes Cy’ statements of comparison in an experiment involving m means. Suppose one conducts a 
series of such experiments. For the LSD test, the probability az, of an error of Type I is the expected 
proportion of statements which are wrong. The corresponding probability for Tukey’s range test, aw, is 
the expected proportion of experiments with one or more wrong statements. For Fisher's test, a is the 
expected number of wrong statements per experiment. A table gives the values of az corresponding to 
aw =0.05, 0.01 for various combinations of m and N, also corresponding to ag =0.05, 0.01 for various 
values of m. Another table compares 6z and Sw for ez =aw =0.05, 0.01 for various combinations of m 
and N, and for various values of 8 =|us—y:| /o. A third table gives the sample size N necessary to fix 
both a and at certain levels for both the LSD test and Tukey’s test, for various combinations of mandé. 
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The Dun and Bradstreet Surveys of Businessmen’s Expectations. Mrttarp Hastay, National Bureay 
of Economic Research. 


This paper is an attempt to validate the working hypothesis that expectations data make a net 
contribution to our ability to forecast the future of important economic variables. By a “net contribu- 
tion” is meant that the expectations show subetantial positive association with subsequent experience 
after allowance is made for the relation of these expectations to other data of potential forecasting value 
that are or become available at the same time as the expectations. Such auxiliary data are supplied by 
the Dun and Bradstreet Surveys themselves in the form of percentage distributions of the reporting 
firms according to the quality of their business experiences in the period just closed, and by the De- 
partment of Commerce in the form of aggregate time series which represent summations for past 
quarters of the economic variables to which the expectations of individual firms refer. 

The findings of the paper, while not strictly conclusive, strongly suggest that in forming their ex- 
pectations business executives take account of information that is not wholly dependent on current or 
past values of the variables reported on. The expectations thus have forecasting value when used in 
combination with such pre-existing data. 

As a test of these findings, the Dun and Bradstreet expectations are used to appraise the timing 
and progress of the current recession on the basis of statistical relations worked out for the pre-recession 
period. The conclusion is reached that a revival of activity was under way by the second quarter of 
1954 and that the recession has been a notably mild one. 


Dollars and the Dollar Area. DonaLp F, HeatHerinaTon, National Foreign Trade Council. 


Since 1945 the position of the United States, Canada and the dollar area as a whole has been 
fashioned increasingly by factors and forces outside their direct, independent control. Vital policy 
decisions have been taken repeatedly in response to the apparent dictates of special emergency situations 
rather than arrived at as essential, studied elements in a long-range program. There has been an adjust- 
ment to a new, special type of environment, with military aid, military expenditures and foreign sub- 
sidies bulking large in the balance of payments. Only as the general environment changes may we look 
for any substantial shift in the structure of either the United States or Canadian balance of payments. 

The dollar area is an even more loosely knit, informal association than its historic counterpart, the 
sterling area, and owes its existence to natural growth and the reciprocal flow of trade and investment 
rather than to any decision or declaration on the part of a central authority or the constituent states. 

The heart of the dollar area is the United States, in that it is the only member whose currency would 
be considered a full central reserve currency. This is an acknowledgment not only of the gold conversion 
facility afforded international holders, but also of the sheer industrial strength and financial resources 
of the economy. These factors, together with the volume of external trade conducted and the world- 
wide acceptability of the dollar, have made the United States a natural clearing center and the dollars 
currency of settlement. 

Canada stood for years in an indeterminate position, with one foot in the dollar bloc and the other 
in the sterling system. With the passage of time several forces combined to align the Canadian dollar 
more with that of the United States, and to fix it as a dollar oriented currency. 

Since 1945 Canada’s only means of divorcing itself from the dollar area would have been through 
the imposition of the most rigid restrictions and the establishment of discriminatory trade controls, 
either of which, if carried to the necessary extremes, would have wrenched the Canadian economy and 
placed certain segments at a distinct disadvantage. 

The United States and Canada, together, constitute the hard core of the dollar area as it stands, 
forming with Cuba and Panama, the inner circle. 

On the basis of 1952-1953 trade returns, countries in the outer dollar area rely on the United States 
and Canada as a joint market and source for well over half of their merchandise exports and imports 
with the percentages ranging in several instances to above 75 per cent. 

A preponderant share of the external direct investment in the outer dollar area has come from the 
United States, and much of this investment has been placed in enterprises directly or indirectly as- 
sociated with the production of those commodities and products for which a market has been or can be 
developed in the United States. 

The supposed “dollar shortage” has been symptomatic of far deeper difficulties, induced or in- 
herent, than the intimated failure of the United States—or Canada—to place enough dollars abroad to 
meet every purpose. During the past eight years there has been, in fact, not a single “dollar problem,” 
but several problems which have focussed on the dollar, and which, while separate, have had a reciprocal 
influence. In essence the first of these involved a lack of confidence, the second a lack of production and 
capital, the third a lack of social stability and acceptance of economic realities. 

Between the beginning of 1946 and the end of 1953 the United States and Canada jointly have 
placed approximately $153 billion at the disposal of other economies, giving them command over dollar 
goods and services to this extent. 
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In the near-term the dollar supply picture is not apt to undergo any appreciable change, either 
in amount or direction of flow. There is no reason to suspect that the direction of primary outflow of 
United States dollars will change, since the products and commodities in greatest potential demand are 
those from within the dollar area and raw material or mineral regions. Where these dollars in turn are 
first spent, however, may be expected to undergo considerable change, as local production facilities 
and competition alter market structures. A cautionary note must be sounded against any expectation of 
a substantial sustained rise in exports from the dollar area following on the re-establishment of some 
sort of convertibility to the major European currencies. 

There is not only considerable sympathetic understanding but also active interest in the United 
States with respect to the developmental aspirations and needs of the world. However, the process by 
which development occurs must necessarily involve a more effective marshalling of national savings for 
internal productive investment purposes and/or a less suspicious—at times hostile—attitude toward 
private direct investment abroad. 


Needed Improvements in the Census from the Standpoint of Public Housing Users. Morton HorrMan, 

Housing Authority of Baltimore City. 

The needs of a variety of public and private users of housing census data have been well served by 
the 1940 and 1950 Censuses of Housing. The interests of the public housing analyst differ in degree 
rather than kind from those of technicians of other federal or local programs of community betterment, 
and also have much in common with the statistical requirements of mortgage lenders and others con- 
cerned with real estate markets. Certain additions and modifications in the Housing and Population 
Censuses would enhance their usefulness for those engaged in public and private housing activities. 
Data on conversions, included in 1940 but omitted in 1950, merit inclusion because of their significance 
for the study of changing housing patterns, and because of the expectation of declines in quality and 
size of converted dwelling units. The sufficiency of only the renter- and owner-occupancy categories is 
questioned in view of the growth of “contract ownership” schemes in urban areas. 

The adequacy of the “dilapidation” concept used to measure structural condition in the 1950 
Census is assessed. Because local rehabilitation programs, under the stimulus of the 1954 Housing Act, 
will be causing a sharp reduction in the number of outside toilets or shared baths in particular areas, 
the substandardness concept of housing and redevelopment agencies, which is based on the combination 
of dilapidation and presence of three plumbing facilities, will not be nearly as useful in 1960 as in the 
past. Possible scaling devices in the housing quality indices are mentioned. The desirability of coverage 
of sanitary, safety, and environmental aspects of housing from the standpoint of the newer rehabilita- 
tion and conservation programs is brought out, but strong doubts are expressed as to the possibility 
of collecting highly technical environmental data in a mass Census. It is concluded that the Census 
Bureau must keep abreast of the existing and embryonic programs of community improvement so that 
its data are aimed at assisting in the conduct of these programs and facilitating an evaluation of their 
effectiveness. 

Mobility data on a household as well as area basis would be extremely useful to those concerned 
with the fields of housing and community planning and organization. Mobility by race and tenure 
should be crossed with income, housing quality, family type and size, age of family head, crowding and 
doubling up. Information on family moves over a three-year rather than a one-year period, and on 
mobility intentions, would also be of great interest. The 1950 Census materials on family-housing rela- 
tionships are applauded, with recommendations that a nonwhite breakdown and more cross-tabulations 
be included in the future, and that the important “broken family” category be separated out. Income 
data on single persons should not be merged with that relating to families of two or more persons in the 
housing and family-housing tabulations. Concern with the characteristics of displaced families on the 
part of newer and older governmental housing programs increases the need for family income data for 
areas smaller than a census tract. It is recommended strongly that an intercensal housing survey be 
conducted because of the need for current data and the importance of experimentation with new tech- 
niques and procedures. 


The Undergraduate Program in Statistics at Iowa State College. Paut G. Homeyer and D. V. Hunts- 

BERGER. 

A program leading to a B.S. degree was started at Iowa State College in July 1947, at which time 
the Department of Statistics was created. The objectives of this program, in addition to offering the 
undergraduate degree are: (1) to teach an all-college course in statistical principles at the junior college 
level, (2) to offer general courses in statistical theory and methods as well as some specialized courses for 
students majoring in other subject matter areas, and (3) to attract and prepare exceptional students 
for graduate work in Statistics. The first B.S. degree in Statistics was awarded in June 1949 and to date 
twenty-six such degrees have been granted. Of these twenty-six graduates, seven are working for or 
have completed advanced degrees in Statistics. In addition to the majors, four students who minored 
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in the department in their undergraduate program have elected to work for graduate degrees in Sta. 
tistics. The undergraduate major in Statistics consists of a minimum of thirty credits which normally 
consists of two quarter sequences in Statistical Methods, the Theory of Statistics, and Processing of 
Data, plus at least four one quarter courses elected from Business Statistics, Experimental Designs 
Survey Designs, Quality} Control,| Economic Statistics, Psychological Statistics, and Industrial Sta. 
tistics. Each student must also work in two minor fields to acquire a total of thirty credits. One minor 
is invariably Mathematics and normally consists of at least twelve hours beyond the Calculus. Fora 
second minor the student is encouraged to elect a subject matter area where Statistica has recognized 
applications. 

Problems have arisen in connection with finding suitable texts and personnel but it is of interest 
to note that there has been almost no objection to consolidating the teaching of Statistics in a single 
department. With increasing numbers of students certain changes can be made to improve the program, 
At present the undergraduate majors take some courses primarily intended for graduate students minor. 
ing in Statistics. Some changes in the present courses and the addition of new courses are being con- 
sidered. 


Jewish Demographic Research in the United States. C. Morris Horowrrz, Brooklyn College. 


Although the Jewish people, even during the Biblical period have been census-conscious, today the 
Jewish population in the United States knows nothing about its demographic characteristics. Unlike the 
official Canadian statistics, the United States Census Bureau does not collect statistics by ethnic clas- 
sifications. 

When the Jewish community in the United States was in its infancy, welfare efforts were little more 
than sporadic charitable attempts. Today, with a Jewish population in the order of magnitude of five 
million, planning for welfare, cultural, educational and religious institutions and functions must be 
based on scientifically prepared demographic data. 

Numerous methods and techniques have been employed to estimate the extent—and to a very 
limited degree, the demographic characteristics—of the Jewish population. Through the Census of 
Jewish Congregations of the United States Census of Religious Bodies, conducted every ten years from 
1850 through 1890 and from 1906 through 1936, the Census,Bureau conducted a census of Jewish 
congregational membership. For the census of 1926 and of 1936, the definition of Jewish congregational 
membership was so broadened in scope, that for all practical purposes, it became a census of the Jewish 
population. The data was collected by extremely questionable methods and from unreliable sources, 
and yielded information which was not comparable from one census year to another and which was little 
more than an aggregate of guesses. 

The Yom Kippur or School Attendance Method of estimating the extent of the Jewish population is 
based on the premise that Jewish children do not attend public school on Yom Kippur, the most holy 
day on the Hebrew Calendar. It involves a comparison of the registration in the public schools, the 
number of absences on a “normal” day, and on Yom Kippur. The result, modified by some constant, 
supposedly yields a Jewish child population figure. Based on the assumption that the Jewish age 
distribution is similar to that of the general population, the extent of the total Jewish population is 
estimated. This technique involves too many questionable assumptions. 

The technique, Interpolation from Census Data, assumes that the demographic characteristics of an 
area which is densly populated by Jews are the same as the characteristics of the Jewish people in that 
area. The assumptions are many, and with the almost halting of immigration, and consequently, with 
an increasing proportion of the Jewish population being native born, this method could not be applied. 

The Master List method involves the formation of a list from membership rolls of Jewish organiza- 
tions, fund raising campaigns, etc. A questionnaire is then circulated either among the entire list or a 
portion of it. These lists are biased, and one questions the validity of the assumption that the character- 
istics of affiliated Jews are the same as those of the unaffiliated ones. 

The Jewish Name Method is based on the preparation of a list of typically Jewish names and ona 
study of the incidence of these names in the community. The preparation of such a list is a subjective 
tesk and the list of names in the general community to be studied usually turns out to be a biased one. 

The Matching Technique, developed by the U. 8S. Census Bureau, involves the sending of a list of 
Jewish names and addresses to the Bureau, which will pull out the corresponding IBM Census cards, and 
run them off for any information desired. The preparation of the original list, and the fact that the 
names and addresses must be as of the census date, prove to be weaknesses of this technique. 

The Birth Rate and Death Rate techniques are based on the determination of either a birth rate or 
age specific birth rate, or a death rate, or an age specific death rate. Basically these methods revolve 
around the Jewish Name Method. Not only does this technique involve the fallacies inherent in the 
latter, but develops limitations of its own. 

As a positive approach to this problem, it is suggested that the possibility of the organization of a 
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Jewish Demographic Laboratory attached to some university, possibly the Yeshivah University in 
New York, should be investigated; and that the feasability of sampling studies, both on a national and 
local level, should be studied. 


Some Recent Advances in Statistics in the U. S. Department of Agriculture. Eant E. Houseman, 

U.S.D.A. 

A sketch is given of some of the developments in statistical procedures for a few broad areas of 
activity within the Department of Agriculture including the review and coordination of statistical work, 
research work on problems of crop and livestock estimating, studies of statistical techniques for eco- 
nomic analyses, and the establishment of a Biometrical Services staff in the Agricultural Research Serv- 
ice. Current research on crops, primarily cotton and corn, is directed toward (i) getting a better under- 
standing of the concepts underlying farmers’ reports of acreage and yield, (ii)’determining how much 
various factors contribute to differences between reported yields and estimates from sample measure- 
ments of the crop in the field just prior to harvest, and (iii) further work on sampling techniques. In 
the field of econometrics, studies have included work on the possible application of simultaneous equa- 
tion models, spatial equilibrium models, and linear programming, in addition to improvements in 
methods of using the more conventional least-square technique. 


The Relations between Correlational Expressions of Test Reliability and Variance Ratio Expressions. 
Crrit J. Hoyt. 


The equivalence of a number of formulas for obtaining the split-half reliability coefficient was 
previously shown by Cronbach in a 1951 Psychometricka paper. It was pointed out in Hoyt’s paper that 
these formulas of Flanagan, Guttman, Mosier, Rulon and Cronbach are individually equal to the ratio 
of four times the covariance of the test halves to the variance of the total scores, when scores are ex- 
pressed as deviations from their respective means or ru =4Da’b’ /Dt’? where a’ and b’ are the half-test 
scores and ¢’ is the whole test score—all expressed in deviations from their respective means. It was also 
noted that the formula given by Rulon which expresses the reliability coefficient in terms of the ratio 
of the variance of differences in half test scores to the variance of the whole test is equivalent to the 
analysis of variance reliability formula, run = MS; —-MSg/MS,, where MS; is the mean square for indi- 
viduals and M Sg is the mean square for error. If one employs the maximum likelihood estimate as given 
by Jackson and Ferguson for obtaining the estimate of the product moment correlation coefficient be- 
tween the scores on two halves of the test and then uses the Brown-Spearman formula for obtaining the 
reliability coefficient of the whole test, this likewise is algebraically equivalent to all the formulas sum- 
marized by Cronbach. Applications were made to new data. It was noted that the use of the usual 
formula for the product moment correlation between the half tests in conjunction with the Brown- 
Spearman formula is lacking in consistency of assumptions since the Brown-Spearman formula does 
assume equal variance for the half-test scores while the usual product moment correlation does not. If 
one is willing to make the assumption of equal variances of the half-tests for justifying his use of the 
Brown-Spearman formula consistency would :‘emand that he make this assumption earlier in correlating 
the half-test scores. This assumption would imply the usefulness of the appropriate maximum likelihood 
estimate as given by Jackson and Ferguson in Bulletin No. 12 University of Toronto Department of 
Educational Research. 


Personal Saving in Canada: Direct Estimates 1939-1953. D. J. R. Humpnreys, Bank of Canada, 

Ottawa. 

To date in Canada there has been only one published estimate of total personal saving. This is the 
“residual” estimate of personal saving arrived at in the National Accounts by deducting estimated per- 
sonal expenditure from estimated personal disposable income. This residual estimate of personal saving 
has two drawbacks. First, a small percentage error in the estimate of income or expenditure may produce 
a much larger percentage error in the residual estimate of saving and because of the magnitudes involved 
this error in saving may be quite large in absolute dollar terms. Second, the resulting estimate is a single 
aggregate and no clues are given as to its component items and their movements. 

The “direct” estimate, using a balance sheet approach, analyzes the changes which have occurred 
in a selected period in the great variety of assets and liabilities of the personal sector which are classed 
by the National Accounts as saving. The net total of these changes produces a saving estimate which 
provides some check on tho residual estimate of personal saving and at the same time reveals the 
changes which have been occurring within the aggregate total. 

There does not appear to be any systematic pattern of differences between the direct and residual 
estimates and the study would seem to confirm both in absolute terms and as a percentage of personal 
disposable income the high level of personal saving in the war and post-war period, as shown by the 
National Accounts. 

The direct estimate has been very successful in providing some insight into movements of the com- 
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ponent items of personal saving. It has made it possible to study the changes which have occurred in 
either a single year, or over a 15 year period, in liquid asset holdings; in contractual saving such as life 
insurance; in investment in residential housing; in investment by farmers and unincorporated business 
in machinery, equipment and inventories, and in such liabilities as consumer and mortgage debt, bank 
borrowings and payables. 

In addition, the direct estimate of saving has also provided a number of valuable by-products. In 
the case of liquid assets the balance sheet approach has produced estimates of total personal sector 
holdings of liquid assets other than corporate bonds and stocks. The direct estimate has been used in a 
preliminary way in the making of GNP-GNE forecasts and later it may make feasible forecasting of the 
supply and demand for funds by the personal sector. In its present preliminary state it has served as a 
useful springboard from which to launch new statistical studies, the most interesting of which has been 
the recent Dominion Bureau of Statistics survey of trusteed pension funds. It has also been reintegrated 
with the National Accounts personal income and expenditure detail to provide a modified form of source 
and use analysis. 

The apparent advantages of the direct estimate of saving should provide incentive for further de- 
velopment. Now that the National Accounts is producing quarterly estimates of personal saving, the 
direct estimate should be moved onto a quarterly basis as soon as possible. For interpretative purposes 
attempts should be made to split down the direct estimate of personal saving into its three major groups 
of savers, that is farmers, other unincorporated business, and other persons. By means of direct surveys 
some attempt should be made to determine the character of saving and investment by the personal 
sector, by income group and probably by region. Finally, the statistics where feasible should be shifted 
from a net to @ gross basis in order that a money flows analysis along the lines suggested by Messrs. 
Copeland and Brill might be made for the Canadian economy. 

These developmental problems are no longer ones of integration and interpretation of existing 
statistics. They are problems of developing entirely new series, series which necessitate direct contact 
with non-Government bodies, with the business and financial communities. Success will depend on their 
co-operation. 


Statistics Curricula at Purdue University. Pau. Iricx. 


About twenty-five undergraduate, dual level, and graduate courses in statistics are offered at 
Purdue each semester to an average total enrollment of four hundred students. Approximately one-third 
of these courses deal with the theory of mathematical statistics and probability while the remaining 
courses are described as statistical methods courses. The theory courses are given in the department of 
mathematics, as are about one half of the methods courses. Other methods courses are given in the 
departments of agronomy, agricultural economics, economics, forestry, and psychology. 

One full year course for undergraduate students serves as an introduction to both theory and meth- 
ods. First and second semester methods courses at the dual level are offered in five different areas of 
application. A committee of statistics instructors from the various departments serves to bring about 
the coordination of these courses, both with respect to course content and methods of classroom pres- 
entation. Undergraduate students may major in either mathematical statistics or in statistical methods. 
Either of these curricula specifies that the student must take about forty semester hours from the 
courses in statistics and closely related subjects. Undergraduate students who do not major in statistics 
may elect to take either the undergraduate course or a first and second semester of statistical methods. 
In the latter case, the student may choose from courses which are either general in their application, or 
which are planned with special emphasis on applications in agriculture, economics, engineering, or 
psychology. 


The Adequacy (or Inadequacy) of Statistics for the Purposes of Forecasting in the Field of Mortgage 
Investments. A. L. Jackson and A. W. GiiBart, The Equitable Life Assurance Society. 
Reviewing available statistics and citing their specific uses, it is contended that statistics pertaining 

to forces outside the real estate field, for example, government monetary policy, business conditions 
generally, etc., are not adequate to permit accurate forecasting of trends which are necessary in order 
to forecast the mortgage or real estate market. Actual policy as developed under the New Deal ad- 
ministration and the current Republican administration proves the inability of men to predict or to 
regulate general economic trends. 

Specifically citing the errors in forecasting money rates, mortgage rates, real estate prices, etc., 
during the years 1952-1953 and 1954, it is concluded that this is one field in which the American Statisti- 
cal Association can perform one of its greatest services. 

Perhaps the most important conclusion in this paper is that statistics in the insurance company and 
related investment fields need to be greatly expanded and revised so that we can know more precisely 
what the supply and demand for funds is, or will be, so that we will know better how to estimate property 
income and expenses and measure values. In this connection, specific instances relating to housing, com- 
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mercial properties and industry are cited. Again, the conclusion is: much is to be done in the assembly 
coordination and dissemination of needed statistics, with the American Statistical Association assuming 
the lead. 


Multivariate Analysis. J. E>nwarp Jackson and Rosert H. Morris. 


The need for multivariate analysis in quality control problems is illustrated with several hypo- 
thetical bivariate examples. An ellipse whose size and shape is determined by the covariance matrix of 
the two variables is suggested as the appropriate control region. The extension to the p-variate case 
utilizes Hotelling’s 7? statistics and, in the cases of degeneracy or near degeneracy, of Hotelling’s 
method of principal components. 

For short range control work certain modifications are necessary which involve very simple compu- 
tations and yet will yield essentially the same results as a full multivariate system. A specific application 
of these methods is made to a photographic color process. 


Mobilization Planning in Canada (Summary). R. Warren James, Dept. of National Defense. 

Historically, planning for industrial mobilization in Canada has been neglected because of geo- 
graphical proximity to a peaceful neighbor, remoteness from the internal troubles of Europe and the 
subsidiary position of Canada in the imperial political system. In wartime, Canada’s troops have tra- 
ditionally participated heavily in the supply systems of allied countries and responsibility for logistical 
support has been limited. Canadian industrial capacity has been large relative to domestic military needs 
and the major problem of war production has been to fill external demands for munitions whose mag- 
nitude cannot be predicted in advance. 

The development of an independent civilian procurement agency does not facilitate mobilization 
planning in the narrow statistical sense nor does the marked preference for competitive defense contracts. 
Some conflict exists between the optimum development of facilities to meet a future emergency and the 
goal of minimum cost in defense procurement. Since 1950, one major advance in mobilization planning 
in Canada has been the creation of the Department of Defense Production which has undertaken mo- 
bilization stockpiling and the provision of new capital facilities for defense contractors and has served as 
a training ground for industrialists and civil servants in the problems of defense procurement and 
production. 

Analytical studies of supply and requirements in Canada are basically handicapped by the close 
interdependence of Canada and the United States. Efficient war production in Canada will often depend 
on the volume of orders received from the United States or elsewhere. Forecasting of such external re- 
quirements is not very practical. Much attention has been devoted to the elimination of legislative and 
other barriers to procurement by the armed services of the United States in Canadas, particularly the 
Buy American Act. Conversely, Canada’s dependence on the United States for end products, materials 
and critical components means that supply levels will inevitably be influenced by administrative and 
military decisions in the United States whose quantitative effects cannot be predict od. 

Because of intimate involvement with the problems of diversity of equipment, Canada has con- 
sistently espoused increased military standardization in the North Atlantic Community. A great deal 
has been accomplished in converting many classes of weapons and ammunition in Canada to United 
States patterns. This has been supplemented by Canadian participation in the cataloguing system 
originated by the United States and now spreading to other NATO countries. 


The Chemical Outlook. Jenmmy C. Jenks. 


During the first six months of 1954 the chemical industry operated at about 75 per cent of capacity. 
Dollar sales were a little behind in the first quarter of the year, but gradually worked up to the point 
where fourth quarter prospects promise to be satisfactory. Inventories, although still a bit higher than 
entirely desirable in their relation to sales, have declined from the top level of the autumn of 1953. At 
present with working capital of about $4.8 billion, and the long term debt under half that figure, the 
financial overai! position of the chemical industry is comfortable, disproving last year’s fears of a 
collapse. 

Although it seems likely that the 10 per cent rate of expansion made annually during the past 
thirty years will not be entirely maintained in the immediate future, factors in the industry point to a 
restoration of a satisfactory level. 

Total operating earnings before taxes for the last six months were about 18 per cent behind 1953. 
Approximately 25 per cent of this loss may be attributed to higher depreciation and amortization 
charges. In addition to this, inroads on net earnings that have increased are wages, salaries, and selling 
costs. A sharply mounting outlay lies in the budget for research. Also, some plant facilities are operating 
at lower than economic rates. Conversely, the rise in the amount of funds used for research expenditures 
indicates a determination of the industry to assure its future prosperity. 
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A period of moderately declining corporate taxes would bring a return of the long term favorable 
performance to the chemical industry. Because of the expiration of the Excess Profits Tax Law, net 
after taxes for the past six months was perceptibly higher than last year. 

One fineillustration of the industry constantly fulfilling coming needs is the great growth in plastics, 
This has become a sizeable business. Today plastics are seen in the manufacture of automobiles, tele. 
vision sets, and machinery, as well as for shower curtains, floor tiles, and reinforced articles. Among the 
newer offerings is a fibre carpet which will be both long lasting and easily cleaned. It promises to be 
revolutionary. Several forecasts estimate that by 1957 production of plastics will exceed 4 billion pounds 
compared with about 3 billion at present. 


The Predictive Value of Data on Consumer Attitudes. Grornce Katona, University of Michigan. 


The research conducted over the past ten years by the Economic Behavior Program of the Survey 
Research Center demonstrates that changes in consumer motives, attitudes, and expectations exert a 
significant influence on economic fluctuations. Measures of such changes, as obtained through sample 
interview surveys, serve as indications of the direction of forthcoming changes in spending and saving. 

Consumer optimism reached its highest point toward the end of 1952. At that time, in contrast to 
1951, the Korean war was considered as having favorable effects on the domestic economy; confidence 
was wide-spread that the newly elected Eisenhower administration would stimulate business; and people 
were accustomed to the prevailing price level that had risen in 1950-51. Therefore, many more people 
than before decided to satisfy their needs for newer and better durable goods. 

Toward the end of 1953 American consumers looked to the future with leas confidence than a year 
earlier. The truce in Korea as well as news about government economies, decline in production, and 
increase in unemployment created some pessimism which, however, was held in check by satisfaction 
with personal financial welfare and with price stability. Consumers expected to purchase durable goods 
in 1954 at a lower rate than in 1953, but the decline was small. 

By June 1954 consumer sentiment improved again. Appraisals of the economic situation and 
prospects were only slightly better than at the beginning of the year, but it was significant that the 
downward trend of 1953 did not continue. One major factor contributing to this development was that 
& great many people were agreeably “disappointed” insofar as dire predictions about a recession or a 
worsening of their own financial situation were not fulfilled. The improvement in sentiment was strongest 
among urban upper-income consumers. 


Needed Improvements in the U. S. Census from the Standpoint of Social Statistics Users: Social 
Statistics Section’s Paper—Neighborhood Improvement. ALBrert J, Kennepy, National Federation 
of Settlements and Neighborhood Centers. 


Neighborhood is that aspect of the community process through which contiguous households relate 
themselves to one another in support of two primary purposes of (1) individual maintenance, repair and 
operating base, and (2) social replacement, i.e., procreation and child nurture. Neighborhood becomes 
statistically visible through the sizes, shapes and orientation of massed households, and the institutions, 
i.e., dwelling structures, shops, churches, schools and other organizations which households call into 
being. 

The neighborhood process has three bio-social stages based on space-time-foot movement intervals 
of sensory perception and response, designated (1) domiciliary cluster, (2) neighborhood, a child oriented 
interval, and (3) district, adolescent and adult oriented. Each stage is structured by the institutions and 
associations utilized by the integrated households. Neighborhood institutions reflect the character of 
their sponsoring households. Child-bearing and -rearing households structure the normative neighbor- 
hood. 

Community organization needs for its special purposes statistical aids in identifying and locating 
in municipality the different types of households in and through which individuals of different age, sex, 
class and cultural characteristics associate, especially those families actively involved in child bearing 
and rearing. 

Census Tract P. Tables 1-7 go far toward providing a satisfactory picture of individual status. 
Additional items covering (1) women as operating housewives and mothers, under occupation, and (2) 
religious affiliation as an index of culture, are needed. 

Tables H. 1-10 classify households and families as an associative process. They should be further 
developed, especially the husband-wife group, by number of children under 5, 5-14, 15-19 years of age, 
and by religious background. This material should be included in Census Tract Tables. 

Use of the word “room” as sole measure of dwelling quantity and hence of household functioning, is 
questioned. A further symbol to characterize the useable space, indoors and out, of one and two and 
some even larger structures, should be developed. The items of household equipment, television and 
refrigerator, should be supplemented by the even more functional items of garage, automobile and 
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telephone. Households and families, as above, should be cross classified by dwelling structures and by 
number of rooms, broken down for Census Tracts. 

Neighborhood structure is heavily conditioned by the number and by the orientation of households 
to each other. Rental housing is becoming quasi-public and public in character, and easily dominated 
by economic and political forces. This represents a departure from our typical American culture, the 
effects of which should be questioned. The H. 1-10 tables should be developed to show the location, use, 
condition and social resources of massed one, two and three room dwellingsin super structures and super 
neighborhoods. This material should be tracted. 

The present H., C. T. Table 3, should be developed as a modulator between P. and H. Tables to 
make the Census Tract a workable tool for identifying bio-social processes. Chief items in this develop- 
ment are institutions classified by functions and auspices. 


How Small Can the Sample Be? Naruan Keyrirz, Dominion Bureau of Statistics. 


The question of sample size can only be conveniently discussed for probability samples, and where 
accuracy of the enumeration process is controlled and parameters to be estimated are clearly defined. 
Surveys such as those of consumer intentions discussed by Katona seem to fulfill these requirements. 

The sample required for any given purpose will be smaller where an effective stratification can be 
found; if auxiliary data correlated with the object of the survey can be brought into use; if the object 
of the survey may be served by asking for attributes which are fairly widespread, ideally which apply 
to 50 per cent of respondents; if “oversampling” can be used in appropriate strata; and if quantities 
rather than “yes” or “no” answers can be asked of the respondent. Smaller samples may be used and 
better interpretation of results obtained on future sales, of washing machines for example, if respondents 
are classified according to whether they now have a washing machine and how old it is. One may thus 
be led to study cohorts of washing machines and automobiles just as population students deal with 
cohorts of humans. 

An expression which shows the total error in terms of sampling and non-sampling components indi- 
cates that beyond a certain point increase of numbers enumerated in the sample is unprofitable as 
compared with work on the reduction of enumeration error and investigation of biases of optimism or 
pessimism which affect all respondents equally. 

Intentions data based on a given sample would be most useful if they could be fitted into a predic- 
tion scheme which included intentions of investors, governments, etc., and made use of known income 
demand functions. A further increase in the effectivenees of the sample would be obtained by including 


in the prediction model data on past performance of the economy. 


Short-Cut Formulas for the Exact Partition of x? in Contingency Tables. A. W. Kimpatu, Oak Ridge 

National Laboratory. 

Irwin and Lancaster [Biometrika, 36 (1949), 117-34] derived methods for partitioning x? from 
contingency tables into individual degrees of freedom. From an rzc table, the partitioning yields 
(r—1)z(c —1) quantities which asymptotically are independent and have x? distributions with one degree 
of freedom. Furthermore, the single degree of freedom x?’s sum exactly to the x? computed from the 
complete table. The computations, as outlined by these authors, involve the construction and subse- 
quent multiplication of three rze matrices. In this paper, the computations have been greatly simplified. 
Each single degree of freedom x? is given by a formula which is a function only of the observed fre- 
quencies and which closely resembles the familiar short-cut formula for a four-fold table. A general ex- 
pression is given which permits the construction of such formulas for contingency tables of any order. 
The method is applied to some experiments which compare the effects of X and beta radiation on 
mitotic rates in grasshopper neuroblasts (Chortophaga viridifasciata). 


The Effective Application of Statistical Methods Throughout an Industrial Organization. E. P. Kina, 

Eli Lilly and Co. 

This paper is concerned with non-technical problems of the industrial statistician. Problems which 
arise from the interrelations of statistician, administrator, and client are discussed. In particular, the list 
includes: 1) determining a statistician’s appropriate level of responsibility in an industrial organization, 
2) effectively communicating the results of applied statistics to top management, 3) dealing with over- 
optimistic and over-pessimistic clients, 4) determining the statistician’s proper role in long-term co- 
operative work, and 5) communicating effectively with clients. 

The general position is taken that none of these problems has a unique solution. The adequacy of 
the solution obtained in any particular instance appears to hinge on the capabilities of the individual 
statistician and how he utilizes them. For this reason, most of the suggestions made in this paper are 
directed toward the statistician, rather than toward the administrator or the client. 
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Some Sampling Efficiencies of a Double Sampling Device. Lestiz Kisx, University of Michigan. 


When listing addresses within sample blocks for surveys the field interviewer hastily assigns eco- 
nomic ratings of L, M and H (for low, medium and high) to dwelling units. The means and the standard 
deviations differ greatly among these strata with regard to social and economic characteristics. E.g., the 
means (and standard deviations) of the liquid assets holdings of spending units in the ZL, M and H strata 
are respectively $660 ($1,700), $1,800 ($3,400) and $5,800 ($10,000). The strata comprise about 33%, 
38% and 7% respectively of the dwellings; 22% are without rating, chiefly in the open country. These 
strata may be used for the allocation of different sampling rates for survey interviews, with the aim of 
increasing precision. Tables of the results of computations are given for many items from a recent 
Survey of Consumer Finances, showing the precision per interview for several allocation schemes, 
Through greater sampling rates for the “higher” strata, gains in precision up to 50% or more (as against 
proportionate sampling) may be had for some estimates, such as the mean income, mean liquid assets 
and the characteristics of the higher economic strata. However, this “loading” resulis in small to 
moderate (5% to 20%) losses in precision for other items, such as the estimates of many proportions, 
In the appendix, derivations are given for the variances of estimates of the total and of the mean of a 
characteristic for members of a subclass obtained from a stratified random sample. These are respectively 
and approximately: Z”(Ni/nalMncya+Ma(l—Mn) ¥il and (1/M)Z*(NZ/na)[Mno'ya + Ma(l — Mi) 
(¥a—Y)?], where M is the number of subclass members in the population; there are M, members among 
the Np elements in the Ath stratum; ng is the size of sample, Y; is the mean for a characteristic and o}, 
is its variance among the subclass members in the stratum. 


The Effects of Bimodality and of Skewness in a Population on the Distribution of “t”. J. C. Layman 
and R. A. Brap.ey, Virginia Polytechnic Institute. 


The small sample test, known as the “t” test, for hypotheses on the mean of a normal population 
developed by “Student” -nd modified and extended by R. A. Fisher is still of outstanding importance 
in statistics. For the strict validity of the test it is required that the sample be of independent observa- 
tions from a single Gaussian distribution. In practice these conditions are never more than approximately 
met. R. A. Bradley, in 1952, presented a general mathematical approach to obtaining approximations 
to the distribution of “t” for specified non-normal populations. Examples developed were restricted to 
symmetrically distributed populations. In the present paper, the effects of sampling from a mixed 
population with density function f(z) =a:@:+a:2, where ai+a:=1; a1, a2 20 and :, @: are normal or 
Gaussian densities with possibly different means and variances were partially investigated. This density 
is of interest since, with changes in a:, a: and the means or variances of the @’s, it may be used to repre- 
sent populations which are bimodal or which have varying degrees of skewness. Two special forms of the 
general density above were selected and studied in some detail for samples of size two. The regu!ts ob- 
tained from the study of the two density functions considered indicate that the probabilities that ¢ ex- 
ceed preassigned values to differ only slightly from the corresponding probabilities for the normal den- 
sity. Thus, these results substantiate the opinion that the effects of moderate departures from normality 
may not be serious in the analysis of variance of sample means. 


The Development of Census Tract Cities in Canada and the Dominion Bureau of Statistics Census 

Tract Program. O. A. Lemieux, Dominion Bureau of Statistics. 

In Canada, 14 of the principal cities are divided into Census tracts. Two, Winnipeg and Vancouver 
were tracted prior to the 1941 Census and the remainder between 1941 and 1951. 

The tabulation program was planned to give Census tract statistics for the various characteristics 
of the population such as sex, age groups, marital status, origin, official language, years of schooling, 
labor force, class of workers, occupation groups by sex and earnings of wage-earners. Statistics were also 
tabulated for households showing the number of persons and families per household and families showing 
children by age groups. Finally, statistics were tabulated for dwellings showing type, tenure, years of 
occupancy and dwellings showing various types of facilities. 

Many requests were received for these statistics by persons and organizations of different types. 

The inclusion of census tract tabulations in the over-all tabulation program is costly in time and 
money. It is, therefore, necessary for the Bureau to know how useful these statistics are in order to 
justify the expenditure. 


A Designed Experiment in Engineering. Frep C. Lronz, Case Institute of Technology. 


A single factor experiment is noted and its linear model is presented. This is followed by an evolu- 
tion to two and more factors and the inclusion of replication. Interaction is briefly discussed. 

A 38 factorial experiment is then presented and analyzed. The object of the investigation was to 
study the manner in which the prominent variables in the metal cutting process, and their possible 
interactions, contribute to the principal subdivision of the energy used in the process. The basic assump- 
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tion in this cutting process is that the total energy required is divided between that required in shearing 
the chips from the parent work piece and that required in overcoming the friction between the chip 
and the top of the tool. 

The variables included in the investigation were the rake angle which was controlled between —5° 
and +15°, the yield strength of the material being cut, controlled between 100,000 psi and 200,000 psi, 
the cutting speed, controlled between 75 fpm and 125 fpm, the feed, controlled between .005 in/rev and 
015 in/rev and the width of cut, controlied between .060 inch and .150 inch. 

It was concluded from the analysis, that all of the factors considered here have a significant effect 
on the energy distribution in the process. Most of the two factor interactions proved to be significant 
but none of the three or four factor interactions were significant. 


Responsibilities and Organizational Placement of the Statistical Consulting Group. Szsastian B. 
Lirraver, Columbia University. 


Some industrial jobs are assumed to require so highly specialized techniques that people have been 
assigned to them expressly because of their knowledge of these special techniques. Yet frequently enough 
this narrowness of approach yields much less in the way of operably fruitful results than was anticipated. 
Hence it seems much more pertinent in the present discussion to focus upon the place of statistical 
method in the large organization rather than to start from the assumption that there must be specifically 
a statistical consulting group. 

The practical use of statistical method has pervaded so many aspects of modern profit-making and 
non-profit making enterprise alike that it is more efficient to look upon statistical method as the province 
of all hands concerned with problem solving rather than as a function of a special consulting group. The 
stat’stical thinker may be concerned with the rate of production, the prediction of sales, formulation of 
inventory policy, control of accident frequency, the control of quality, the design of biological experi- 
ments, evaluation of military equipment, control of billing errors, and a host of other large or small 
problems, all in the same organisation. Efficient functioning of the statistical thinker requires as much 
familiarity with substantive fields of knowledge as with particular statistical techniques. In the most 
efficient use of statistical method, it is a fundamental aspect of methodology of complex problem solving, 
ard as such, is intrinsic to operational analysis by interdisciplinary groups. 

Considering large enterprise for the moment, there can be three levels of responsibility, namely, 
(1) general policy making, (2) translating general policy into operating policy, and (3) implementing 
specific operating techniques for realization of operating policy. The efficient use of statistical method in 
the light of this organizational framework by means of operations research, operations analysis, and 
operations engineering groups, is illustrated with examples from industry. 


Stability of Production Rates as a Determinant of Productivity Levels. Sepastian B. LirraveEr, 

Columbia University. 

Among the many factors which influence industrial productivity levels, the ultimate determinants 
are in the technical cause system usually referred to as the production process. The type of production 
process, the degree of mechanization, the worker skill, and other forces contributing to the total cause 
system are, of course, influenced by general economic and other social considerations, as are, in fact, the 
productivity goals. However this be, the degree of attainment of productivity levels is directly dependent 
upon the technical cause system. This system, regardless of the productivity goals aimed at in order to 
meet economic and other social objectives, can vary so widely and so irregularly as to bring about in 
practice great departures from these desired productivity levels. 

A productivity level is a technical objective which can be predicted from a given cause system. In 
order for prediction to be possible, there must be evidence that a cause system exists, whose parameters, 
mean and variability, are measurable. Otherwise stated, the rate of production of a given work cycle and 
the daily or hourly, or other useful time unit of production, must be statistically stable. If that condition 
does not exist, there is not a meaningful level of productivity. Furthermore, in order to attain aimed at 
productivity levels, one must be able to predict the outcome of the given productive cause system and 
therefore one must have a state of statistical control of the production process. 

The technical procedures entailed in bringing about statistical stability of given productivity rates 
usually result in raising these rates until an asymptotic “optimum?” level is reached. This “optimum” 
level is dependent on many circumstances, dominant among them are, of course, the state of the manu- 
facturing technology, financial forces, and management attitudes. Evidence of the effects of the technical 
cause system and of the efforts made to bring about stable production rates is presented. 


Some Problems in the Development of a Synoptic Climatology. Taomas F. Matone and Rosert G. 
Miter, Massachusetts Institute of Technology 


In an attempt to ascertain the information concerning temperature and rainfall contained in 
atmospheric circulation patterns, linear operators relating circulation to weather elements have been 
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derived from climatic data. These operators are able to explain up to 87 per cent of the variance of 
meau daily temperature in a contemporary sense and up to 84 per cent for a one-day lag. The operators 
explain 55 per cent of the variance of an areal average of twenty-four hour rainfall. In seeking an 
explanation for the high predictability, operators were derived for the prediction of circulation patterns, 
The correlation coefficient between predicted and observed twenty-four hour pressure changes has been 
computed for several cases of new data and is observed to go as high as 0.9. The derivation of operators 
of this kind is feasible only through the use of high-speed digital computers. Possible extension to a four- 
dimensional climatological model is discussed. 


The Interactions of Certain Contingency Tables. NarHan ManteEL, National Cancer Institute. 


The nature of interaction, and its ordinarily correct interpretation and use in the analysis of 
quantitative data, is discussed. 

A correct understanding of the nature of interaction should lead to correct and valid analyses of 
contingency table data. However, in practice, many analyses are made because they are quick and easy, 
to meet other desiderata. Examples are analysis by factorial chi-square, or use of the arc sine trans- 
formation. For the purpose of finding interactions these are biased methods, 

In any particular problem, a correct analysis must flow from an understanding of the problem at 
hand. Certain types of analysis and transformations which may be useful in particular problems are 
presented. The possibility of extending these ideas to more complex contingency tables is discussed, 


Recent Advances in Government Statistics. Hersert MarsHaALL, Dominion Bureau of Statistics. 


An important objective of the Dominion Bureau of Statistics in recent years has been to speed up 
the issuing of various reports. In pursuing this objective, however, care has been taken not to sacrifice 
essential quality for the sake of earlier publication. While the attitude of the perfectionist may be avoided 
statistics should reflect adequately the real economic and social trends and processes which they sum- 
mate. Where statistics are admittedly only approximations their limitations should be clearly stated in 
publication. In fact it should be general practice to make every effort to specify the degree of accuracy 
of each statistical series. 

Canada’s Labour Force Survey was changed from a quarterly to a monthly basis in November 
1952, and a timing of release within 44 weeks of the reference week achieved. Special efforts are being 
made to study the problems of non-sampling errors. 

It has been decided to extend the mark-sense principle to the agricultura! schedule to be used in the 
1956 census. A mark-sense card was used very successfully in the 1951 population census. Time for 
completing the 1951 census was cut in half. 

Publication of Vital Statistics has been speeded up greatly by an overhaul of the tabulation program 
undertaken by the Bureau in cooperation with Provincial officials. The tabulation of the results of a 
nation wide sickness survey is furnishing information on concepts and definitions which will be of great 
assistance in the development of continuous statistics in this field. New schedules for hospital statistics, 
both operating and accounting, arising out of Dominion Provincial Conferences, are now in use and will 
result in much more complete information. 

In current agricultural statistics the use of probability sampling techniques is being extended to 
supplement the data received through non-random mail questionnaires. 

In the census of industry, questionnaires are being changed over from a production to a shipment 
basis. Other changes are being made in the annual schedule to make it conform more closely to business 
accounting thus easing the burden of reporting, spending of returns, and increasing the consistency of 
the statistics. 

A considerable widening of the Bureau's information on inventories is in progress. In addition to 
the statistics on unfilled orders in manufacturing industries a series on new orders received each month 
is in preparation. In retail and wholesale trade new samples have been designed embracing more kinds 
of business and permitting better regional estimates to be made. An important development in trans- 
portation statistics is a pilot survey of road transport which it is hoped will be the start of national sur- 
veys. Quarterly statements of the Balance of International Payments now supplement the annual 
statement. These appear shortly after the end of the quarter. The main National Accounts also are 
available now on a quarterly basis. Distribution of individual and family incomes, by size, for the year 
1951 have been completed and will be published soon. Work on applying the methods used in the 
construction of the Bureau’s index of industrial production (mainly manufacturing and mining) to the 
other sectors of the economy is far advanced. 

Work on an input-output table is progressing. An experimental table for the year 1949 is in course of 
preparation. Although experimental and relatively modest in scope it is expected to provide a guide for 
future development in this field, with special reference to the potentialities of the project for purposes of 
statistical improvement, integration and development. 
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The Use of Discriminant Functions in the Estimation of the Proportion of Cases in a Given Population. 

G. E. McCreary, Cornell University. 

Approximately seventy trichotomous questions which were administered to 612 community people 
and 78 hospitalized psychoneurotics were being tested for their ability to yield estimates of the propor- 
tion of psychoneurotics in the community. Twenty of these questions were chosen as being the most 
reliable indicators of psychoneurosis.;; 

In the first phase of the analysis we ignored the underlying dimensions being tapped by the test 
items and assumed that a linear function of the responses could be used to minimize the amount of over- 
lap, i.e., to maximize the discrimination between the two populations. The exact solution of this problem 
requires the calculation of a 20 X20 matrix of the intercorrelations between items. For the three point 
scale of the items the sums of squares and crossproducts reduce to a rather simple formula which can be 
calculated by cross-tabulating the responses of each item against one another. The extension of this 
formula to a k-point scale can easily be made. 

The inversion of the 20 X20 matrix in order to solve for the item weights is a considerable task on 
band calculators and is even a sizeable task if one has to program it for a semi-electronic calculator. 
There were indications in the literature that certain approximations could be used in place of this long 
exact solution without increasing the error by any sizeable amount. Eight of these were tried with the 
same data as in the full discriminant analysis. However, none seemed as powerful, with this type of data, 
in reducing error as previous authors had indicated. 

Error was measured in terms of both Type I and Type ITI errors (false positives and false negatives) 
at various cutting points. The various appreximations were compared as to their ability to separate the 
community mean from the hospital mean relative to the average amount of variation around each mean. 
The worth of the method was also measured in terms of the ability of the approximations to distinguish 
between areas with varying percentages of false positives and false negatives. The validity of each 
method of approximation was tested against 64 recheck interviews in which a psychiatrist rated com- 
munity respondents according to their degree of functioning. 

We had hoped that some easy calibration would be nearly as error free and valid as the exact 
method. Hence we could have recommended a simple discrimination method to other investigators in 
other societies, in other interview situations, with differing items on the same subject. In fact we our- 
selves would have liked to use a simple calibration for additional data collected in ways differing from 
the original data on the points noted above. 

Although the exact discriminant analysis requires a considerable amount of work it is considerably 
more error free and valid. Of the four methods using intercorrelations between the items which require 
a medium amount of work the Horst-Smith is best. Of the four methods based on the marginal distribu- 
tions of the items and requiring little work, a weighting scheme developed by Macmillan works fairly 
well. This last method is, of course, not as error free or valid as the Horst-Smith or the exact discriminant 
analysis. If we use equal weights for the items, it is not possible to distinguish community populations 
which have differing percentages of psychoneurotics. 


Non-Parametric Estimation of Survivorship. Paut Merer. 

A standard problem in life testing and in medical follow-up studies is the estimation of some char- 
acteristic of the distribution of survival times (e.g., mean, median, or proportion surviving to a given 
time) from a sample, each member of which is observed only for a limited time. 

Certain advantages of non-parametric estimation procedures are noted, and it is shown that for 
decreasing time interval sub-divisions the familiar “Actuarial” estimators of the proportion surviving 
approach a common non-parametric limit which is essentially unbiased. This estimator is found to be 
the maximum likelihood solution of the non-parametric estimation problem, and its variance is shown 
to be well approximated by a formula proposed by Greenwood. 

Consideration is given to estimation and mean survival which is the characteristic of first im- 
portance in many life testing problems. Again, the limiting form of the “Actuarial” estimator is essenti- 
ally unbiased and maximum likelihood, with variance approximately given by a formula suggested by 
Irwin. 


Measures of Industrial Costs in Relation to Industrial Productivity. Seymour MELMAN. 

Investigations of variation in industrial productivity levels indicate that the ratio of labor to 
machinery costs has been a major determinant of productivity changes over time and of differences at 
single times. Accordingly, the problems of measuring labor and machinery costs affect our ability to 
predict productivity levels. — 7 

The cost to management of employing production workers may be estimated by means of job rates 
and hourly earnings data for single occupations or for an entire work force. Such measures become less 
useful estimates of the cost of employing industrial workers as non-wage payments are enlarged. There- 
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fore the use of categories like average hourly earnings confer a bias on ratios of man-hour to machine. 
hour cost. 

Machinery cost may be measured in terms of particular machines or by means of price indexes of 
selected equipment. For such cost comparisons preferred machines are those whose use directly involve, 
labor replacement. The significance of labor-machine cost comparisons is altered as labor cost become 
fixed and machine cost more variable in relation to output. 

Refinement of labor and machinery cost es will make possible more precise estimates of the 
portion of variability of industrial labor productivity that must be explained by other factors. 





Some Problems in Sampling in Air Pollution. Georncre H. Miuuy, Chemical Corps, U. S. Army. 


In order to put sampling into perspective the total air pollution problem is viewed as a series of 
processes comprising an operation. These processes include pollution generation, transfer and effects, 
The efforts involved in pollution abatement are seen to fall into several modifying processes, viz. contro| 
of generation, diminution techniques, and influence on the transfer mechanism. By consideratio:, of the 
entire operation it is shown that knowledge of the process of transfer of pollutant is a requirement for 
effective attack of the broad air pollution problem. It is also shown that lack of this knowledge is the 
primary source of a number of major recurring sampling problems. Available atmospheric diffasion 
theory is identified as the basis for current knowledge of this transfer process and certain aspects o/ this 
theory are reviewed. The major shortcomings in respect to solving the sampling problems cited are 
considered and in general arise from the deterministic form of current theory which fails to account for 
variability of the atmospheric medium and gives no information on variability of the contaminant con- 
centration field. These shortcomings are seen to be deficiencies in fundamental approach, best overcome 
by application of statistical methods. 


Some Elementary Aspects of Applied Statistics. S. Monro. 


There are three general sources of variability in statistics used for analysis: variability of the thing 
being measured, errors of computation, errors in data. This paper is concerned with errors in data. 

Shewhart’s study of the oidered sequence of data is augmented by the studies of digital preference 
and of greatest common division of first differences between data. The binomial and chi-square dis- 
tributions are used in the tests with appropriate criteria. Some useful rules are suggested. 

Illustrative examples are presented and some general discussion of the causes and importance of 
data errors is included. 


Corporate Securities in the Pension Trust Picture. Roger F. Murray, Bankers Trust Company, New 
York. 


On broad economic grounds, corporate securities are especially suitable for the investment of 
industrial pension funds. To the extent that such investments contribute to productivity gains, they 
ease the burden of meeting commitments to retired workers. In general, private securities make a more 
direct and immediate contribution than public issues. 

Also, the companies which establish pension trusts and the managers of their investments have a 
familiarity with, and liking for, corporate securities. It is not surprising, therefore, to find the bulk of 
additions to pension trusts being invested in corporate bonds and stocks. 

Although regular buying of common stocks by pension trusts is a market factor of some significance, 
the impact on prices, both actual and potential, has been exaggerated. The buying is neither as large nor 
as concentrated as frequently supposed. Furthermore, it is sober buying for long-term yield which re- 
quires a showing that sound values are offered at prevailing prices. 

Even though pension trust purchases of equities will not inaugurate a “new era” for the stock mar- 
ket, they will continue to be a factor in the healthy broadening of the capital markets. Whether pension 
trust money bears a venture capital or a seasoned equity label, it clearly adds to the volume of available 
funds for risk taking. In the absence of restrictions or other inhibiting influences, the enlarged flow of 
funds will search out the full range of opportunities for the progress and growth of a dynamic economy. 


On the Use of the Range Instead of Standard Deviation. Gotrrriep E. Norruer, Boston University. 


Many standard test and estimation procedures require the computation of the sample standard 
deviation. It is, however, often possible to replace the standard deviation by the much more easily 
computed sample range without appreciably reducing the accuracy of the method, at least from 
practical point of view. The purpose of this paper is to discuss various range methods which have been 
suggested in the statistical literature in connection with problems involving the parameters of one or 
two normal populations. 

It is well-known that the efficiency of a single range estimate decreases rapidly as the number of 
observations increases. However, this tendency can be counteracted by dividing the sample into sub- 
samples and computing the mean of the various subsample ranges. The author has prepared a table 
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giving the best possible division into subsamples of equal size for samples containing from 2 to 100 ob- 
servations. This table also provides the necessary constants for computing point estimates and con- 
fidence limits for the standard deviation o of a normal distribution and for carrying out tests of hy- 
potheses concerning o. The same information is given for the corresponding problems involving the 
mean of a single normal distribution as well as the difference of the means of two normal distributions. 

The efficiency as well as the standardized error of these procedures are discussed. It turns out that 
for many practical purposes the loss in accuracy is negligible and certainly compensated by the greater 
simplicity and speed of range methods. 


Comparison and Evaluation of Microchemical Methods. C. L. Oaa, Eastern Regional Research Labora- 
tory, Philadelphia. 

A series of collaborative studies of microchemical methods have been conducted during the last 
seven years to determine the most reliable methods for microanalysis and to improve on these methods 
when possible. The methods shown to be best by these studies have been adopted as official procedures 
by the Association of Official Agricultural Chemists providing the results obtained in collaborative 
teats were sufficiently accurate and precise both on the intra- and interlaboratory basis. 

To determine which method for a given determination produced the best results, two pure com- 
pounds of known composition were sent to a group of 20 or 30 microanalysts with the request that they 
make at least quadruplicate analyses on the samples by the method they normally used and report all 
data obtained. Each collaborator was also asked to complete and return a form giving uniform and 
complete information on his procedure. 

The mean and variance of the data from each laboratory were calculated and tabulated. Inter- 
laboratory precisions obtained by different methods were compared by applying the F test to the 
variance of the collaborators’ mean values. Student’s ¢ test was used to determine whether or not the 
difference between the means for two methods was significant. Knowing the true value, procedures which 
introduced constant errors could likewise be ascertained. 

In addition to determining the best general method, alternate procedures within a method were 
evaluated using the same statistical tests. The data obtained by the general method were subdivided 
usually into two groups representing alternate steps in the procedure. The F and ¢ tests were applied to 
these two groups of data to determine which, if either, was the better. When a significant difference was 
found the better step was written into a tentative method. If no significant difference appeared, the 
simpler, more easily applied, or the more commonly used technique was selected. 

The tentative procedure thus obtained was tested collaboratively and if both the intra- and 
interlaboratory accuracy and precision were satisfactory, the method was adopted as an official pro- 
cedure by the Association. 

The fine cooperation of many microanalysts, plus the application of a few simple statistical tests 
to the data submitted by them has permitted the Association not only to select the best methods pres- 
ently available but also to improve on existing procedures. 

The methods for which official procedures have been adopted are carbon and hydrogen, nitrogen by 
the Kjeldahl procedure, sulfur, chlorine and bromine. Methods for nitrogen by the Dumas procedure, 
and for methoxyl, ethoxy] and acetyl groups are under study. 


Meteorological Applications of Statistics. H. Panorsxy, Pennsylvania State College. 

The paper describes some of the uses to which meteorologists have put analysis of spectra and cross 
spectra. Applications range from forecasting of diffusion of air pollutants, and airplane design to study 
of the general circulation of the atmosphere. / 


Economic Projections by the U. S. Department of Commerce. Louis J. Paraprso, U. S. Dept. of Com- 
merce. 

Although the Department of Commerce does not publish forecasts of business conditions, the 
Office of Business Economics publishes jointly with the Securities and Exchange Commission the results 
of surveys of businessmen’s intentions to purchase plant and equipment. In addition, the Department 
has published on two occasions analyses of prospective markets based on stated assumptions. 

The first of the markets studies, “Markets After the War” appeared in March 1943 and was widely 
used by businessmen in planning for the postwar. The general conclusion of this analysis was that the 
techniques employed were adequate in indicating the rough magnitude of the potential over-all postwar 
increase in economic activity, but were less reliable for the detailed segments. 

“Markets After the Defense Expansion,” published by the Department in late 1952, was an 
attempt to appraise business prospects in the period 1953-55, particularly those factors making for a 
continuation of a high level of activity or those producing some deficiency of demand. This involved a 
careful examination of the prospects for each of the major segments of the gross national product. The 
implications of the defense program were thoroughly explored, and a survey of businessmen’s anticipa- 
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tions was made for 1953-55, The major conclusions of this study were that 1953 would be another year 
of good business, but that prospects for 1954 were somewhat more uncertain. Actually, the trend of 
defense expenditures developed differently from that projected and taxes were reduced. Such develop- 
ments, as the report states, would have important consequences on the course of business. The study was 
helpful in concluding that the short-run prospects for business were favorable and that plant and equip- 
ment outlays would continue high. 

A fruitful approach to the problem of forecasting seems to be through surveys of intentions to 
spend, such as the Department of Commerce-Securities and Exchange Commission annual and quarterly 
surveys of planned business investment. The annual surveys have correctly anticipated the direction 
of change of such outlays for all years since 1945 and have closely approximated the magnitude of the 
change except for 1950. The results have been somewhat less reliable by industry, by company, and by 
quarters. 

In summary, despite the difficulties and shortcomings of the techniques applied to the development 
of projections, the results have proved to be highly useful to the business community and have been of 
value to those needing guides as to the general trend of economic activity. 


Cross-Spectral Analysis of Time Series. Witiarp J. Pizrson, Jn. and Lxo J. Tick, New York Univer. 
sity. 

The notion of cross-spectra is explained in terms of the response of a linear system to a random dis- 
turbance. The transfer function is, of course, the ratio of the output to input spectrum. The “phase in- 
formation” is shown to be contained in the cross-spectra, and a phase function is defined as the are 
tangent of the ratio of the quadrature and co-spectrum, (which are the imaginary and real part of the 
cross-spectrum respectively) which reduces to the usual notion of phage for a single frequency spectrum. 

The use of these spectra for estimation of the parameters of a linear system is shown. Problems of 
identification arise here. 


The Process of Learning by Experiment. E. W. Pixr, Raytheon Mfg. Co. 

The process of learning by experiment is presented as a servo loop whose action is to bring a theory, 

a mathematical model of reality, into closer conformity with reality. The elements of this process are: 

(1) the theory; (2) the experimental prescription, an exact statement of the operations constituting an 
experiment related to the theory; (3) the experimental population, the unknown potential results of 
indefinite repetition of the experimental prescription; (4) the taking of samples from the experimental 
prescription by carrying out the experimental prescription in reality, one or more times; (5) the in- 
ference from the experimental samples to the parameters of the experimental distribution; (6) the com- 
parison of these inferred parameters with those predicted by the theory; (7) the modification of the 
theory to conform to the result of this comparison. A secondary loop of statistical control information 
goes backwards around element (4). 

Statisticians are, in the broadest sense, specialists in this process. Their active interest is usually 
limited to the hypothesis testing in (5) and (6), to the control loop around (4), and to the probabilistic 
models which are used in many branches of science (physics, genetics, economics). 

While all these elements are present in any experiment, the relative emphasis may shift so greatly 
from science to science that the common process is hard to recognize. For example, in agronomy, the 
type-science of current experimental statistics, the statistical inference (5) and (6) dominates the scene. 
In physics, it is negligible, while the control loop and the probabilistic models hold the center of the 
stage. Other sciences show other patterns. 

The historical development of these concepts by Galileo, Bridgeman, Shewhart, Fisher, and others 
are sketched. 


Time Series Problems in Aeronautics. Harry Press, Langley Laboratory. 

This paper reviews some recent applications of random process theory to problems in aeronautical 
engineering. In aeronautics, a number of problems occur in which the behavior of the airplane under the 
influence of a random type disturbance is of concern. As examples of this type, the effects on the airplane 
of such disturbances as atmospheric turbulence, fluctuating aerodynamic forces associated with buffet- 
ing, and the irregularities of ranway surfaces will be briefly described. These disturbances, in some cases, 
give rise to severe structural loads and violent airplane motions. Because these disturbances can affect 
the safety, economy, and performance of an aircraft, the successful design and operation of aircraft re- 
quire the determination and control of the airplane behavior under the influence of these disturbances. 

In recent years, progress has been made in the analysis of some of these problems by applications 
of random process theory and in particular, the techniques of generalized harmonic analysis. The ap- 
proach generally involved in these studies is to determine the spectral characteristics of the disturbance 
either experimentally or theoretically and then to apply the results obtained to the calculation of the 
spectra and time-history characteristics of the airplane response. In order to provide a concrete illustra- 
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tion of the use of this approach, results obtained in a theoretical and experimental study of the behavior 
of a high-speed missile in flight through rough air are described. 

In the applications of random process theory to aeronautical problems, a number of statistical 
problems are encountered in both the measurement of the disturbance and in the calculation of the air- 
plane response. These problems are frequently made difficult by the complex nature of the disturbance. 
For example, atmospheric turbulence is a four-dimensional random process (three space and a time 
dimension) involving the three components of the air velocity vector. As a consequence, the problem of 
measuring the characteristics of the turbulence requires the determination of the spectra and cross- 
spectra for the velocity components. Because of these complexities, such measurements are expensive 
and require efficient experimental designs with particular regard to the sampling variability. A sampling 
theory for spectra estimation has so far only been developed for the case of a single disturbance of a 
Gaussian type. The sampling reliability of cross-spectra estimates, however, has as yet not been estab- 
lished even for the Gaussian case. In addition to these sampling problems, there exist other problems 
concerned with the airplane response calculations. For the simplest case of a Gaussian disturbance and a 
linear system, the results are straight-forward since the output spectrum may be calculated directly and, 
in principle, completely specifies the response. For non-linear systems and non-Gaussian processes, the 
problems are further complicated and a need exists for further work in these areas. 


Migration as It Facilitates or Retards Occupation and Employer Mobility. Aupert J. Reiss, Jr., 

Vanderbilt University. 

The paper describes and analyzes the nature and extent of employer and occupation shifts for 
migrant men in an urban work force. The data for employer, occupational and residential shifts are 
for 2,499 sample cases of white males aged 25 to 64 years in the four cities of Chicago, Los Angeles, 
New Haven and Philadelphia in 1950. Approximately 16 per cent of all men at work in these cities in 
1950 were migrants. 

Migrants as expected, usually changed their employer in their first post-migration job. Persons who 
migrated for, or as, an employer were surprisingly stable in their attachment to an occupational level as 
compared with those who changed employers. The older the migrant, the greater the difficulty in ob- 
taining work at the pre-migration level, if employers are changed in migration. 

Attachment to an employer varies with the specific occupational assignment of the migrant both 
prior to, and after, migration. Migrant men in all non-manual occupation groups, except sales, were less 
likely to change employers when moving than were migrant men in all manual occupation groups. 

There is a moderate degree of stability in attachment to an occupational level when the jobs before 
and after migration are compared—58 per cent of all migrant moves are stable. There is greater risk 
ard uncertainty in making occupation shifts through migration than if one makes occupation shifts 
without migration, regardless of the age of the migrant, however. The older the migrant, the greater is 
this risk and uncertainty. 

Migrants in the higher status non-manual and manual occupations who change occupational leve 
are more likely to be downward than upward mobile after migration, while migrants in the lower status 
non-manual and manual occupations are more likely to be upward than downward mobile after migra- 
tion. Semi-skilled jobs are the single most accessible job to all migrants who make occupation shifts. 
Migrants in each major occupation group are somewhat more likely to make occupation shifts to related 
than to distant occupations, except among proprietors. 


Some Admissible Tag-Recapture Procedures. Dovuaias 8. Rosson, Cornell University. 

A discrete chance variable X has the probability distribution p(z, u) where u is a positive integer 
or zero. X is a sufficient statistic and the likelihood ratio is monotone in the following sense 

(a) if p(z, we) 0 then p(z, u) =e either for all u Swo or for all u we 


(b) p(x, ue) p(ze, ui) <p(xe, ue) p(21, ui) if and only if both sides are not zero and 2»<2%, us<u, 


An estimate of u is to be based upon a single observation x. The loss due to making the decision u =j 
when the true state is u =7 is ji -j|. Only integral valued decisions are allowed, and a decision function 
dis defined as 3(x) = {Sa(z)}, a=0,1, °° *,ad.inf., where 5,(z) 20, Zia(2) =1. A decision function 3 is 
admissible if and only if, 
(i) for every possible z there exists an integer az [0 such that 5,(z) +5a,4:(z) =1 

(ii) 5a (z) >0 implies p(x, a) >0 

(iii) 2 <y implies az8q, (2) + (az +1) be241(2) Slayda,(y) + (ey +1) 8a,4+1(y)] 
where [c] is the largest integer in c. Proof of the necessity of the montonicity of 5(z) was given by H. 
Rubin in “A complete class of decision procedures for distributions with monotone likelihood ratios,” 
abstract, Ann. Math, Stat., 22 (1951), for a more general problem of which this is a special case. If both 
sample space and parameter space are finite then there exists a unique admissible minimax procedure 6* 
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which in general is truly randomized and which attains its maximum risk on the parameter points 
ueSuiS * * * Suny: and has the property uj S2Dad*(z;) Suiy1, i=0, 1, + + + , n. Numerical examples 
a 


of some simple ball and urn models are given in which the risk functions of the minimax and classical 
solutions are computed. 


Foreign Financing of Canadian Investment in the Post-War Years. A. E. Sarartan and E, B. Carry, 
Dominion Bureau of Statistics. 


Despite very large imports of foreign capital in recent years, the net contribution by non-residents 
to the savings used for all types of physical investment in Canada has been small. Canadian sources of 
saving from 1946 to 1953 were large enough to finance all but about 4 per cent of net capital formation 
(after deducting depreciation allowances), and all but about 9 per cent of gross capital formation ag- 
gregating more than $33 billion. But not all of the savings of Canadians were used to finance investment 
in Canada. Much was invested abroad, particularly through government joans, and there were sub- 
stantial retirements of debts which had been contracted abroad. Canadian sources of saving have 
directly financed about three-quarters of both net and gross capital formation in Canada since the war, 
with non-resident sources directly financing the remainder. The Canadian economy actually generated 
most of the savings necessary to finance this remaining portion of the investment program, but part wag 
used abroad, part represents earnings retained in Canada by non-residents, and part was set aside to re- 
place durable assets owned by non-residents in Canada. 

. These estimates measure only net use of foreign resources and direct foreign financing in relation to 
capital formation in Canada. They are not indicative of whether the underlying investment decisions 
originated within or outside Canada. Nor do they indicate such other aspects of foreign financing as the 
availability of venture capital and the borrowing of techniques. 

These findings are in sharp contrast to the extent of foreign financing in previous periods of heavy 
investment. In the period of 1926 to 1930, for example, the contribution of non-residents on balance was 
about 25 per cent of net capital formation while direct foreign financing of net capital formation (neglect- 
ing outflows from Canada) was nearly 50 per cent. The corresponding figures for the period 1946 to 
1953 were 4 per cent and 24 per cent respectively. 

The tempo of foreign financing has increased greatly since 1949. For many years before this Canada 
had been a net exporter of capital. But direct foreign financing of investment in Canada was one-fifth of 
the total even in 1946 to 1949. In 1950 to 1953 Canada’s net use of foreign resources covered more than 
10 per cent of capital formation, while direct foreign financing rose to about 25 per cent. 


The Best Linear Estimates of the Mean and Standard Deviation of Different Populations Obtained from 
Singly and Doubly Censored Samples. A. E. Sarwan, University of North Carolina. 


The best linear estimates of the mean and standard deviation of the u-shaped, the rectangular, 
parabolic, triangular and the double exponential distributions from singly and doubly censored samples 
are worked out for smal] samples. The variances of the estimates and their efficiencies are calculated. 
The behavior of the coefficients for the largest and smallest observations is noticed in every distribution 
as the number of the unknown observations increases (n is fixed). These coefficients and the zelative 
efficiencies of the estimates are found to form a sequence in these different populations. The effect of the 
tails on the estimates and their variances are also discussed by considering a skewed distribution. 


Government Securities in the Corporate Pension Trust Picture. R. Duane SaAuNDERS. 


Since 1949 the Treasury has been receiving, on a confidential basis, reports from bank administra- 
tors on the Federal securities held for the account of corporate pension trust funds. From this material 
certain conclusions can be drawn. 

The number of uninsured pension plans is significantly large: than had been previously estimated. 
For 1951 the accepted estimate was that there were under 2,000 uninsured pension plans, whereas the 
Treasury survey (which is not all-inclusive) at that time covered 3,400 corporate pension trusts. 

In the period since 1949 corporate pension trusts reported in the Treasury survey have increased 
from 1,900 to 4,900, an increase of about 150%. The rate of increase, however, has been moderating. 

There is a seasonal pattern in the asset growth of these corporate pension trusts. First quarter con- 
tributions are large and then decline in the following quarters. Total asset data were only recently re- 
quested from the survey respondents and these replies showed an increase in assets of 7% in the first 
quarter of 1954 and about 44% in the second quarter. On the basis of these reports it is estimated that 
the total assets of corporate pension trusts were about $104 billion in June 1954. 

In general the investment practices of these funds are similar to the patterns evident in other long- 
term investor groups. During the war years the proportion of the funds invested in Governments rose 
to almost half of total assets. Since then the share of Governments has decline to about 25%. However, 
the asset growth has been so marked that the dollar amount invested in Governments since the early 
postwar years has more than doubled. 
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Characteristically these trust funds hold longer term Government securities. But like other investor 
groups these portfolios have been declining in their average length of time to maturity. This, in part, is a 
result of the shortening of the Federal debt itself, but in addition, investment managers have more than 
doubled their holdings of short-terms as a per cent of total Governments. At the other end of the scale 
the holdings of savings bonds are large—nearly a third of Government portfolios. This was to be ex- 
pected as the number of trusts are large, the annual limit on purchases has only a minor influence. Also 
savings bonds are especially suitable for these funds. 

The broader generalizations do hide one of the most striking features of the investment practices 
of corporate pension trusts. There is an extremely wide diversity in the investment practices of these 
funds. Banks administering sizeable funds show from 9% to 66% of total assets invested in Govern- 
ments, that same diversity is evident in all the reports right down through the smaller funds. The appeal 
of savings bonds as an investment medium also shows this same diversity. Although some of this variety 
may result from the relative newness of many corporate pension trusts, it is more likely that the lack of 
uniformity in the structure of pension plans themselves is the major reason. Pension plans are in large 
part specific to the individual corporation and the solution to the pension problems of each firm have 
been met in a variety of ways. Until such time as the structure of the pension plans themselves becomes 
more uniform, these diverse investment practices will continue. 


The Survival of Red Blood Cells. Marvin A. ScHNEIDERMAN, National Cancer Institute. 


Mathematical models of the survival of transfused red blood cells are considered, and some of their 
limitations are discussed. A chronological development is given showing the emergence through biologi- 
cal experiments of new concepts of the behavior of red blood cells—which follow asa consequence of the 
simple mathematical models. Treating the population of red blood cells as a stationary population, a 
short discussion is given showing how birth rates can be found from a consideration of the survivors of a 
random cross section of the population in which the ages of the individual members are not known. An 
experimental technique, which is very useful from the biologist’s point of view, radioactive tagging of 
the red blood cells, leads to some problems in estimation that have not yet been solved. 


Developing Work Units in a Child Placing Agency. E. E. Schwartz, Social Security Administration. 

Units of count used in many social work agencies are not suitable for the purposes frequently made 
of them. The four types of count currently in use in agencies characterized by the use of the case work 
process are case counts, activity counts, staff counts, and financial data. The basic unit, the “case”, was 
originally intended to relate to the individual or family receiving service. The practice has developed of 
attempting to load onto this term the burden of measuring the case workers’ responsibility, the volume 
of work performed by the case worker, and the extent of the services provided. Because the “case” is not, 
in fact, a homogeneous unit of count describing work performed or service delivered, it cannot be related 
with precision to either the count of personnel providing service to get average work loads or to agency 
expenditures for service to get costs per unit of service. Both of these important averages require meas- 
urements which relate the use of agency resources to the services provided. A pattern of units of count 
for social work agencies is set forth to measure input-output relationships through a statistical work 
measurement procedure. A pilot study now in process to test the feasibility of applying work measure- 
ment to a child placing agency is described. Particular emphasis is placed on the development of work 
units and problems encountered in their application to programs for placing children for adoption and 
for foster care, 


How Good are Current Statistics for Following Economic Changes? Wiiu1am H. Suaw, E. I. du Pon 
de Nemours. 

The goodness of current statistics for following economic changes is analyzed first by discussing how 
good they need to be, and second by testing the needed goodness against four standards. These stand- 
ards—accuracy, timing, correspondence with market reality, and classification and detail—have to be 
applied qualitatively because most key statistical series are not susceptible to quantitative tests of 
adequacy. ; 

Appraisals of goodness using each of the standards are presented for national income and product 
figures, construction activity estimates, plant and equipment expenditures, the Department of Com- 
merce measures of sales and inventories, the Federal Reserve Board indexes of industrial production, 
and the major employment estimates. Even though the appraisals are slanted deliberately toward 
realistic needs rather than unattainable ideals, substantial defects or gaps are noted for most of the 
series, 

The fact, however, that many improvements in current statistics are needed should not obscure the 
more important fact that twenty years ago most of the appraised series did not even exist. Moreover, 
significant quality advances have been made during the past five years. The producers of the various 
series have used their modest resources well but for the most part have been unable to get even small 
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additions to these resources. The problem of getting additional resources is unlikely to be resolved 
satisfactorily unless economists and business analysts can do a much better job than they have done to 
date of showing a positive relationship between good statistics and the making of good policy and 
managerial decisions. 


Optimum Sampling in Binomial and Multinomial Populations. P. N. Sommervitie, Virginia Poly- 
technic Institute. 

Let Ie, Ui, * + *, Ug represent k+1 multinomial populations with unknown parameters ayj, 
aj, * * * ,akj,j =1,2, * + *, p. We wish to choose the population I; for which the function 2?.., Ojai; is 
greatest. If the sampling must be done in one stage, then results stating the “optimum” amount of 
sampling that should be done prior to making a choice are given. The “optimum” sample size is the one 
which minimizes the maximum expected loss, taking into account the amount of use to be made of the 
choice, the cost of sampling and the cost of making a wrong decision. 


Quantitative Methods in Physical Anthropology. J. N. Spunuer, University of Michigan. 


Quantitative methods have been of some, but quite unequal, use in each of the three general prob- 
lems of primary interest in physical anthropology during the past 100 years: (1) Human evolution, (2) 
Classification of the living varieties of man, and (3) Comparative growth and development of man. This 
paper is concerned mainly with the problem of classification. PS) 

For many human populations of classificatory interest, probability sampling’ based on available 
methods would be relatively simple and routine. Failure to employ satisfactory sampling methods is one 
of the major defects in modern anthropometric practice. But in the case of certain populations of 
anthropological interest, adequate mathematical models are not available. The problem of defining and 
sampling a local breeding population centering about some point within a large population with area 
continuity, variable density, and high internal mobility is used to illustrate the need for new statistical 
models in the study of physical anthropology. 

Three methods of estimating the biological distance between two or more human populations using 
anthropometric data are discussed: (1) The seriatim method where differences between means of char- 
acters are judged or tested taking the characters one at a time, (2) The Coefficient of Divergence for 
Multiple Measurements (CD) of Clark, and (3) The D? methods of Mahalanobis and Rao. 

Three difficulties in using the seriatim method are: (1) All characters are assumed to have equal 
classificatory weight, (2) Correlation of characters is ignored, and (3) The probability statements of 
significance tests are not appropriate measures of biological distance. D? overcomes all three difficulties. 
D? is expensive in computational labor when more than a few populations and characters are classified, 
but this expense is not large (when machine methods are used) compared to the cost of collecting the 
original data. 

CD and D? are identical for zero correlation. CD overcomes the first and third difficulty inherent in 
the seriatim method, but (where r #0) not the second. Computational labor is relatively small for CD, 
even for relatively large numbers of populations and characters. CD is an approximate method of un- 
known reliability. CD suffers from lack of a significance test (when r 0), a difficulty which may be over- 
come partially by using large samples. CD and D? are shown to reach identicai classificatory results 
(membership in “clusters”) in a study of 8 populations using 8, 9, and 12 characters. 

An advantage of CD over D? is that it may be computed where data are restricted to means—a 
very common situation in anthropometric reports published during the past 100 years. Where individual 
data are not available, CD is to be preferred over the seriatim method for estimates of biological distance 
used in anthropological classification. 


Unemployment Statistics and Economic Policy Uses. Coarues D. Stewart, U.S. Department of Labor. 
Section I of the paper deals primarily with the rationale of the concept of unemployment used in 
the current measurement of the labor force by the U. 8. Bureau of the Census, and its relevance for 
major purposes of economic policy. Section II deals with outstanding questions of measurement other 
than those of sample design and sampling variability—whether in fact the Census survey succeeds in its 
purpose of enumerating all persons actually in the labor market with jobs or seeking work. Section III 
raises, briefly, a more fundamental question whether labor force statistics measuring labor market at- 
tachments under existing conditions of demand bring within the scope of measurement all those persons 
“able, willing, and seeking to work” described by the Congress as within the scope of national economic 
policy. Section IV considers needs for additional information on unemployment, and concludes that 
more or less additional data of various kinds are needed depending upon the nature of economic policy— 
whether largely limited to fiscal and monetary policies directed toward stabilizing or expanding aggre- 
gate demand or aimed more directly toward aid to particular groups, industries, or geographic areas. 
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Analyses of Experiments with Correlated Observations and Heterogeneous Variances. H. C. Swreny, 

Virginia Polytechnic Institute. 

Methods for the analysis of experiments when the usvai assumptions of independence between 
plots and homogeneous variances are not tenable have been investigated, following the method noted by 
F. Graybill for the analysis of randomized blocks design. Detailed methods for analyzing factorial ar- 
rangements and split plot designs are advanced, and the various properties of the tests of significance in 
relation to the usual analysis of variance have been investigated. Extension to the case of incomplete 
block designs is underway. The method depends upon a transformation of the data, and an application 
of Hotelling’s 7? as given by P. L. Hsu to test homogeneity of means. 


Problems in the Estimation of the Population Group Subject to Benefit from a Mass Therapeutic Trial. 

Donovan J. THompson, University of Pittsburgh. 

The estimation of the prevalence of a disease entity or condition in a large population group from 
survey data frequently must be undertaken in the planning or during the execution of a mass therapeutic 
trial. Experience gained at the Graduate School of Public Health, University of Pittsburgh in the estima- 
tion of prevalence of two chronic diseases, arthritis and heart disease, by three survey methods indicates 
that major differences exist between methods. The magnitude of the procedural biases (due to the so- 
called non-sampling errors) of two types of interview surveys is estimated in terms of a “true value” 
defined operationally by the findings of a physician during a physical examination of the same persons 
included in the two interview surveys. 

Preliminary analyses of the data from two separate studies (arthritis and heart disease) indicate 
that the “true value” for the prevalence figure is in the range of 2 to 10 times that reported by a house- 
hold survey employing any responsible adult as the respondent, or a survey employing a randomly 
selected individual in the household reporting only for himself. In addition, treating the physicians 
evaluation as the true condition, the household interview was found to give 16.5% false positive reports 
and 31.2% false negative reports for arthritis or rheumatism, while the corresponding figures for the 
personal interview of each sampled individual are 17.6% and 24.4%. Similar results were obtained for 
heart disease. The findings also suggest that the higher prevalence figures obtained from analyzing the 
reports by a household respondent for his own health condition as against the health condition of other 
members of the household may well be explained in terms of the higher probability of an ill person 
being at home and hence more likely to become a respondent on a household survey. 


A Topic in Variance Components Analysis. W. A. Tuompson, Jr., Virginia Polytechnic Institute. 

Alemma is proved which may sometimes be used to find the class of all statistics whose distributions 
are independent of the nuisance parameters. The least squares model with errors arising from two 
sources is then discussed, and the lemma is then applied to this case. These results are then specialized 
to partially balanced incomplete block designs. 


Statistics and Educational Research. Davin V. TrepeMaN, Harvard University. 

Consider N people on each of whom is available a series of k observations. This observation matrix 
of N rows and k columns may be partitioned in a number of ways. Typical educational and psychological 
problems lead to various partitions and procedures. 

Following Bartlett, an attempt was made to show the interrelations among a number of these 
procedures. The theory of canonical correlation provided the unifying principles. It was recommended 
that: 1. Additional training in the theory of matrix algebra be incorporated in the training of educational 
researchers; 2. Further training in multivariate statistical techniques be incorporated in the training 
of educational researchers; 3. An effort be made to find techniques for analyzing the dispersion of multi- 
ple variables analogous to available procedures for analyzing the variance of a single quantitative 
variable; 4, A technique for studying the trace which the points for N people make through time in a 
T-dimensional space be sought; and 5. Ways and means of obtaining and supporting a large electronic 
computer for educational research be considered. 


The Use of Statistics in the Physical Sciences from the Standpoint of Industry. J. H. Toyiovse. 


The use of “statistical methods in the physical sciences” cannot be characterized by a single meas- 
ure. The range of use is extreme—from organizations which use statistical methods in every way 
possible—to those to whom they are virtually unknown. 

A smal]! questionnaire was used to sample industrial organizations, and of the 423 sent out 253 were 
returned, about equally divided between engineers and scientists. A name was chosen on approximately 
each twelfth page of American Men of Science, and each fifteenth page of Who's Who in Engineering. 
The name chosen was that of an individual whose title or description indicated the field of development 
or research in industry, but not necessarily the manager of a department. 
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Questions were chosen to indicate interest as well as usage. Answers were generally developed ag 
“Yes” or “No”, with some possibility of qualification. The results can be modified by the fact that 9 
scientists and 86 engineers failed to return the questionnaire. 


Industrial Statistics Needed for Mobilization Planning. Wint1am Truppner, Dept. of Commerce. 


Experience in both World War II and in the Korean War provides overwhelming evidence that the 
task of establishing an effective system of production and material controls will take almost a year, if the 
government has to use the methods developed in the past. Preparation for a future emergency demands 
that a method for mobilizing our industrial plant be devised for accomplishing this in a much shorter 
period of time. 

Delay in the past has been due primarily to the problems associated with the establishment of 
production control machinery in the form of orders and regulations to carry out government decisions 
during a war emergency. Second, delay was occasioned by the inability of the government to measure 
the impact of those decisions on economic resources, 

The development of a production and material control system for use in a future emergency repre- 
sents a cornerstone of the mobilization preparedness program. The necessity for making the rules effec. 
tive in a short period of time requires that there be a minimum of change from normal peace-time ma- 
terials purchasing and production scheduling procedures. 

The development of a standby control system for such use is accompanied by the accumulation of 
statistical measures for translating given program levels into the material tonnage requirements needed 
to support them. Upon completion of this phase of the preparedness work, it is felt that the two major 
reasons for past delay in achieving rapid industrial mobilization will have been eliminated. 


Some Aspects of Estimation Theory (Preliminary Report). M. C. K. Tweepte, Virginia Polytechnic 
Institute. 


W. L. Stevens (Biometrika, 37 (1950), 117-29) has published a method of finding fiducial limits for 
the parameter of a discontinuous distribution for which the probability of covering the true value can be 
fixed at a desired level in (0, 1), by utilizing an auxiliary random variable Y which is uniformly distrib- 
uted over (0, 1). As an alternative to his solution, set up an “acceptability function” (a.f.) as(6| 2), 
which ranges from 0 to 1 as @ varies, with the property that the parameter value @ is covered by the 
chosen estimate set (or confidence interval or fiducial interval), derived from observed data z bya 
method 4, if the observed value of Y as above is not greater than aa(6| x). Stevens's result can be used 
to construct an a.f. Such functions might be used as generalizations of confidence intervals for presenting 
inferences in estimation. The total probability of covering 0 is A3(6| 6*) =E, {a3(0| z)|6*}, where @ is 
the true value. Estimation bias may be said to occur if A3(0| @*) > A3(o*| o*) for some @ +6*. The af. 
derived from Stevens's solution for a binomial distribution is biased but an unbiased modification has 
been found. Further developments of the general idea are under examination. 


Analysis of the Variability of Growth of Filarial Worms. Mary G. WestprRooxk and J. ALLEN Scort. 


A useful criterion of immunity of cotton rats to infections with the filarial worm, Litomosoides 
carinii, is the slower growth of the worms in immune as compared to non-immune hosts. An analysis of 
the variability of growth in non-immune animals was attempted to provide improved experimental 
design and possibly reduce the number of parallel control animals needed. Since a separate experimental 
series to provide data for such an analysis would be prohibitively expensive, measurement of 1583 worms 
from rats serving as controls to other experiments were utilized. 

Pearl-Reed logistic curves gave a satisfactory fit to the means of all worms for each day of age when 
the goodness of fit was based on the variability of the means about the fitted curve. Confidence bands 
were desired but impossible to establish because of heterogeneity of variance and the many components 
of variance involved. 

The growth of some free living animals can be studied by successive measurement of the same indi- 
viduals. These internal parasites, however, are sacrificed when they are recovered and measured. 
Furthermore, the specific environment in which they were living, i.e., the host, is likewise destroyed. 
Therefore, growth studies must depend on means of the measurements of groups of worins from a series 
of hosts. The principal components of variance are then (1) within rat variance, (2) between rat variance, 
(3) between day variance due to regression and (4) between day variance not due to regression. Even if 
the data had been derived from an experiment designed for this purpose, lack of orthogonality and 
heterogeneity of variance would have ber: inierent. With the data available analysis of variance in- 
volving all of these components was cles: iy impcasible. It was possible, however, to show that both the 
within rat variance and between rat variance wre significant. Furthermore, sufficient dats were avail- 
able for one time period to further analyze the between rat variance and show significant effects of the 
age and sex of the rats, 
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Fixed, Mixed and Random Models. M. B. Wiix and O. Kempraorne, Iowa State College. 


A procedure is exemplified by which the appropriate linear model for the analysis of a randomized 
experiment can be derived. A population model is defined from the experimental situation, and a sta- 
tistical model obtained by imposing the conditions of the design. The physical meaning and the proper- 
ties of the components of the model are discussed in detail. Using the statistical model, the expectations 
of the analysis of variance mean squares are obtained and, on the basis of these, proper error terms are 

ified. 
aa 4 examples are considered: (i) a general two factor experiment in a completely randomized de- 
sign, with fixed, mixed and random models included as special cases; (ii) a more complex experiment de- 
scribed by Vaurio and Daniel which has been discussed in detail by Scheffe, for which we contrast our 
model and analysis with Scheffe’s. 

Some discussion is given of the advantages and problems associated with the method for model 
derivation presented. 


The Use of Incomplete Block Designs for Factorial Experiments. Marvin ZELEN, National Bureau of 
Standards. 


The use of factorial designs has now become widely accepted as an efficient way for carrying out 
experiments involving several factors where each factor is varied at several levels. However, one of the 
difficulties of carrying out factorial experiments is that many of the block arrangements available are 
for designs having the same number of levels (i.e., p” series). There are few block arrangements for mixed 
factorial designs (i.e., factors not all at the same level). The purpose of this paper is to show how incom- 
plete block designs (particularly the partially balanced designs) can be used to obtain designs for mixed 
factorials systems and in some cases for obtaining fractional replicated designs for mixed systems. 

An example is given showing the experimental arrangement for a 3X5 factorial design in six blocks 
of five experimental units which was used to investigate certain techniques involved in the development 
of printed circuits. 
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All communications concerning this section should be addressed to 
the Abstracts Editor, Professor George E. Nicholson, Jr., Chairman of 
the Department of Statistics, University of North Carolina, Chapel 


Hill, North Carolina. 


Aitchison, J. A. and Brown, J. A. C., “An 
estimation problem in quantitative assay,” 
Biometrika, 41 (1954), 338. 


The model proposed by Finney for the 
quantitative response, u, to a stimulus, 2z, 
implies that the variance of u shall be inde- 
pendent of z. The authors have found that 
for certain economic data this model is un- 
satisfactory, and that it is more accurate to 
regard the coefficient of variation of u as 
constant. They therefore propose a new 
model which uses multiplicative as opposed 
to additive errors, and show that for this 
model the variance of u has the property re- 
quired by their economic data. The paper 
then provides a probit method of estimating 
the parameters of the new model. There is a 
worked example. WALTER L. Situ, Uni- 
versity of North Carolina. 


Anscombe, F. J., “Fixed sample-size analy- 
sis of sequential observations,” Biometrics 
10 (1954), 89-100. 


In many cases the investigator sets out to 
take observations, but has no fixed idea of 
how many he shall take before stopping. He 
may stop at the end of a certain period of 
time, or at the expiration of certain funds, 
or upon some other occasion, which means 
that his sample size is a random variable. 
This paper concerns itself with the conse- 
quences of analyzing N observations (NV 
random) by conventional methods (for 
which N is fixed). If the number of observa- 
tions does not depend upon the observations 
themselves, then the use of fixed sample 
size methods introduces no bias or other 
difficulty. If N does depend upon the values 
of the observations then several cases must 
be distinguished. (a) First, it is possible to 
set oneself the rule “take samples until at- 
taining a significant result (by fixed sample 
size analysis) for the data in hand.” This 
problem is related to non-publication of 
“negative” results. (b) The experimenter 
may take one fixed sample of size n, and 
then a second sample of size nz only if the 
first sample gives a value lying in a normal 
range (e.g. between —a and a, or positive). 


(c) The experimenter may take observa- 
tions sequentially until he obtains a (fixed 
sample size) confidence interval of pre- 
assigned width. Errors arising in these three 
cases are discussed. The author concludes 
that appreciable error may be suspected 
when both: 

(1) “The number of observations de- 
pends on the observations them- 
selves.” 

(2) “The relative dispersion of the num- 
ber of observations in repeated sam- 
pling is not very small.” 


Condition (2) would hold in (a) and (b) 
above, but not in (c) if the preassigned 
length were quite short). L. E. Moszs, 
Stanford University. 


Arbous, A. G. and Sichel, H. S., “New 
techniques for the analysis of absenteeism 
data,” Biometrika, 41 (1954), 77-90. 


The authors suggest as a model for ab- 
sence-proneness a symmetrical bivariate 
negative binomial distribution. For applica- 
tion to industry needs, it was found that ac- 
cumulating one year’s data is necessary be- 
fore preventive action can be taken on 
absenteeism. These data are halved and by 
use of correlation techniques the phenome- 
non of proneness is established. The ob- 
served distribution provides estimates that 
are used for prediction purposes for the 
following year. 

The model may not be the correct one to 
use, the authors say, but it is sufficiently ac- 
curate for practical purposes provided the 
two consecutive exposure periods do not 
exceed one year. Prediction of the absences 
of individuals in the second year (before 
they have actually occurred) is carried out 
with efficiency which can be established in 
terms of operating characteristics. How- 
ever, when remedial measures are applied 
at the end of the first year (before collecting 
the second year’s data) it will not be possi- 
ble to test the efficiency of that model. 
Meaiis of assessing the efficiency of the re- 
medial measures taken are not available at 
present, but the authors indicate possible 
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lines of investigations along which a solution 
may be reached. A. R. KHAtI1, North Caro- 
lina State College. 


Auble, Donavon, “Extended tables for the 
Mann-Whitney statistic,” Bulletin of the 
Institute of Educational Research, Indiana 
University, Vol. 1, No. 2 (1953), i-iii and 
1-39. 


Auble, John Donavon, “Extended tables for 
the Mann-Whitney statistic and illustra- 
tive applications cf certain nonparametric 
tests of significance.” Studies in Education 
1962, Thesis Abstract Series, School of Edu- 
cation, Indiana University, No. 4 (1953), 
9-13. 


Critical values are given for the Wilcoxon 
(sometimes called Mann-Whitney) rank 
order test for equality of the populations 
from which two samples come. All sample 
size combinations from 1 through 20 are 
covered. One-sided significance levels for 
which critical values are given are [0.1%], 
0.5%, 1%, [2%], 2.5%, [4%], 5% and 
[10%]. (Bracketed per cents are not given 
in the second reference.) A general discus- 
sion of the Wilcoxon (Mann-Whitney) test 
is given together with (in the first reference) 
an example of its use. The complete distri- 
butions computed by the author may be 
obtained by ordering Document 3912 from 
ADI Auxiliary Publications, Photoduplica- 
tion Service, Library of Congress, Washing- 
ton 25, D. C. The charge is $4.25 for micro- 
film or $12.50 for photoprints readable 
without optical aid. Wimu1am KRvsKAL, 
University of Chicago. 


Bechhofer, R. E., Dunnett, C. W. and Sobel, 
M., “A two-sample multiple decision pro- 
cedure for ranking means of normal popula- 
tions with a common unknown variance,” 
Biometrika, 41 (1954), 170-6. 


Consider k normal populations {z;} with 
unknown means 4; and variances o?=ajo? 
(c? unknown). This paper is concerned with 
a two-sample procedure for ranking the 
populations according to the means with 
goals the selections of (1) the population 
with the largest population mean and (2) 
the populations having the largest, 2nd 
largest, - - - , smallest population mean. A 
first sample of a;No is taken for the i-th 
population and S? (unbiased estimate of 
o*) is calculated with n=NoZLh, a:—k de- 
grees of freedom. A _ second sample 
(N—WNpo)a; is taken such that N=max [No, 
smallest integer =2So? (hn/min 5)2] where 
min 6=min (ua@)—“@-1)) for goal (1) and 
=min (ui)—a-1)) for goal (2) to be de- 
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tected and hp is a positive constant involv- 
ing the goal, n and the probability specified 
for achieving the goal when 62min 6. By 
properly choosing h», the solution is to rank 
the population according to the overall sam- 
ple means for goal (2) and select the one 
with the largest overall sample mean for 
goal (1). Values of hn have been derived for 
k=3; for k2 A, the integral remains to be 
solved for hn. An expression for the ex- 
pected sample size N has been derived and a 
particular case represented graphically. 
Two examples are considered, but in one the 
first sampling procedure has been substi- 
tuted by an earlier estimate of o%. V. P. 
SHau, North Carolina State College. 


Cadwell, J. H. “The probability integral of 
the range for samples from a symmetrical 
unimodal population,” Annals of Mathe- 
matical Statistics, 25 (1954), 803-6. 


The author gives an asymptotic expres- 
sion for the probability integral of the range 
for samples from a symmetrical unimodal 
population. He calculated the maximum 
error for two expressions by taking the dif- 
ferences between exact values obtained by 
evaluating the p.d.f. and values found by 
quadrature. He found that his second ex- 
pression gives results of reasonable ac- 
curacy. He, then, calculated a table which 
gives corrections to units in the fourth 
decimal place to be added to the approxi- 
mate value for five sample sizes. One can 
interpolate graphically for n and the ap- 
proximate value if the correction is plotted 
against the approximate probability on 
arithmetical probability paper. He also 
tabulated the percentage points of the range 
found by quadrature. This will help in 
making preliminary estimates. A. E. 
SarHAN, University of North Carolina. 


Calvin, Lyle D., “Doubly balanced incom- 
plete block designs for experiments in which 
the treatment effects are correlated,” Bio- 
metrics 10 (1954), 61-8. 


The mathematical model underlying the 
design postulates, for every pair of treat- 
ments ¢ and j, the existence—-and allows the 
estimation of—an effect arising from the 
presence of both treatments in the same 
block, which effect acts oppositely upon the 
elements given treatments i and j; this cor- 
relation parameter is assumed the same in 
every block. 

The model is more general than the usual 
one for balanced incomplete blocks. The 
analysis is simplified by imposing the con- 
dition that not only every pair of treat- 
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ments appears equally often together, but 
also that every triple of treatments does. 
This gives the name doubly balanced in- 
complete blocks. 

The construction of such designs is in- 
vestigated. Designs rarely exist for small r. 
In organoleptic tests small block size is 
necessary, but much replication (large r) is 
ordinarily desirable. Usually the number of 
treatments, p, is less than 20. 

Several designs are given. Others are in- 
dexed. The analysis is derived and a worked 
example is given. L. E. Mosss, Stanford 
University. 


Cox, D. R., “The design of an experiment in 
which certain treatment arrangements are 
inadmissible,” Biometrika, 41 (1954), 287. 


Suppose that the experimental units of an 
investigation are arranged in sets of k units, 
and that the & units within a set have some 
definite ordering (e.g., in time). If system- 
atic differences exist between the serial 
positions within a set and if there exists an 
appreciable component of variance between 
sets, then in a comparison of a number of 
treatments we would ordinarily use a Latin 
square or a Youden square. However, it 
may be impracticable to apply treatments 
in certain orders, and when this is so these 
standard designs are useless. This paper dis- 
cusses what can be done in such circum- 
stances and, on the basis of the usual as- 
sumptions in this field, a method is derived 
for constructing appropriate designs. For a 
few special cases, designs are listed. WALTER 
L. Smirn, University of North Carolina. 


Cox, D. R., “The mean and coefficient of 
variation of range in small samples from 
non-normal populations,” Biometrika, 41 
(1954), 469. 


By examining special populations, a table 
is obtained for predicting approximately the 
mean and coefficient of variation of the 
range of random samples of any size up to 
five, drawn from a population of specified 
kurtosis. The effect of non-normality on 
various statistical methods that use range is 
then considered. [Author’s Summary.] 
Water L. Smita, University of North 
Carolina. 


Cox, D. R. and Smith, W. L., “On the super- 
position of renewal processes,” Biometrika, 
41 (1954), 91-9. 


Suppose that there are a number of inde- 
pendent sources at each of which events oc- 
cur from time to time. The intervals be- 
tween successive events at any one source 
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are assumed to be independent random 
variables all with the same distribution, s 
that each source constitutes a renewal 
process. Equilibrium behavior a long time 
from the beginning of the process for single 
and multiple sources is studied. In particu. 
lar, for a single source, expressions are ob- 
tained for (i) the delay function or distribu. 
tion of time measured back to the immedi- 
ately preceding event, (ii) the frequency 
function of the time interval between suc- 
cessive events and (iii) the probability dis. 
tribution and variance of the number of 
events occurring in time intervals of a given 
length. 

Similar statistical properties are studied 
for the pooled output of N sources for both 
finite and infinite N. The usefulness of the 
results for studying the underlying struc- 
ture of a sequence of observed events result- 
ing from a pooled output of N sources is 
illustrated with neuro-physiology data. 
D. G. Horvrrz, North Carolina State Col- 
lege. 


Cochran, William G., “The combination of 
estimates from different experiments,” 
Biometrics 10 (1954), 101-27. 


This paper considers the problem of com- 
bining a number of estimates of a quantity 
p. The estimates may be called z;, and with 
each 2; is given S? an unbiased estimate of 
the variance of z;, based on n; degrees of 
freedom. 

The first problem to be considered is 
whether the z; agree among themselves; if 
there is significant heterogeneity—.e., inter- 
action, then the choice of a sensible method 
of combination requires careful thought. 

Cases where the variance per observation 
is the same for all z; are distinguished from 
cases where they are unequal. Rules are 
given for choosing from among weighted 
mean, semi-weighted mean, partially 
weighted mean, unweighted mean. L. E. 
Mosss, Stanford University. 


David, H. A., “The distribution of range in 
certain non-normal populations,” Bio 
metrika, 41 (1954), 463. 


The distribution of the sample range w, 
in random samples of size n from certain 
special non-normal populations is con- 
sidered. Exact expressions for the expecta- 
tion of w, are obtained, and for the prob- 
ability of wn exceeding a given value. On the 
basis of these and other results the author 
draws some general inferences concerning 
the effect of non-normality on the distribu- 
tion of range. WALTER L. Smita, University 
of North Carolina. 
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David, H. A., Hartley, H. O., and Pearson, 
E. S., “The distribution of the ratio, in a 
single normal sample, of range to standard 
deviation,” Biometrika, 41 (1954), 482. 


This paper provides tables of upper and 
lower percentage points of the ratio, u, of 
the range of a sample of size n from a nor- 
mal population to the standard deviation of 
that sample. These tables are obtained by 
two methods of calculation. The first uses 
known moments of the range and the stand- 
ard deviation, of samples of size n, to calcu- 
late the moments of the ratio. Pearson type 
curves are then fitted, and from these per- 
centage points are calculated. The second 
method is highly ingenious, and leads to 
exact upper percentage points. The ratio u 
is proposed as a test of homogeneity or 
normality of data, and numerical illustra- 
tions are given of its use in these connections. 
Watter L. Situ, University of North 
Carolina. 


Dodge, Harold F., “Chain Sampling In- 
spection Plan,” Industrial Quality Control, 
XI (January, 1955, 10-13.) 


Small sample sizes often require that the 
acceptance number equal zero. Such at- 
tribute sampling plans have certain unde- 
sirable characteristics. The author presents 
the following procedure as an alternative: 

a) For each lot, select a sample of n units 
(n specimens) and test each unit for con- 
formance to the specified requirement. 

b) Acceptance number of defects, c=0; 
except c=1 if no defects are found in the 
immediately preceding « samples of n 
(=1, 2,3,°°-). 

The equations for the probability of ac- 
cepting a lot are derived and Operating 
Characteristic Curves are presented for 
n=4, 5, 6, and 10 and t=1, 2, 3, 4, 5. 
GERALD J. LIEBERMAN, Stanford University. 


Fucks, Wilhelm, “On nahordnung and 
fernordnung in samples of literary texts,” 
Biometrika, 41 (1954), 116-32. 


This paper discusses various measures of 
the relationship of consecutive elements in a 
text (Nahordnung) and of distant elements 
(Fernordnung). 

Nahordnung: The text was split into con- 
secutive groups of two words each and the 
number of syllables counted for each word. 
A matrix was constructed of the relative 
frequency of ¢ syllables in one word and j 
syllables in the other word of the pair. 
Using this matrix, the mean number of 
syllables and variance for each word of the 
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pair and the correlation between them were 
computed. There are several misprints, and 
statements regarding the equality of means 
and the interpretation of the signs of the 
correlation are open to question. Measures 
of skewness and entropy are also intro- 
duced. Examples from several texts are in- 
cluded. 

Fernordnung: A matrix was constructed 
of the reciprocal of the average number of 
words (plus one) intervening between a 
word with ¢ syllables and one with j syl- 
lables. Various moments and other measures 
were also computed using this matrix. 
Another measure of the relation of distant 
elements was a correlation similar to the 
autocorrelation based on the number of 
syllables in consecutive words, except there 
was no correction for the mean. If the num- 
ber of syllables in consecutive words is 
designated as f;, the correlation is presuma- 


bly 
Shifist 
V Sf This 


where there are a total of A words. The 
author uses a different notation for the sum- 
mation, but the above seems to be his in- 
tent. R. L. Anperson, North Carolina 
State College. 


q,= »t=1,2,---,A-—l, 


Hammersley, J. M., and Morton, K. W., 
“The estimation of location and scale pa- 
rameters from grouped data,” Biometrika, 
41 (1954) 296. 


The distribution function F(z) is un- 
known. Successive experiments yield ran- 
dom samples from populations whose dis- 
tribution functions are F(azx+ 8), the un- 
known parameters @ and 6 varying from ex- 
periment to experiment. The observational 
data are grouped into not necessarily equal 
intervals. It is desired to estimate the values 
assumed by a and § for each experiment, 
and to obtain as good an idea as possible of 
the true nature of F(z). This paper provides 
a method of attacking these problems. 
Water L. Smirn, University of North 
Carolina. 


Hoel, Paul G., “A test for Markoff chains,” 
Biometrika, 41 (1954), 430. 


Bartlett has shown that certain frequency 
counts generated by Markoff chains are 
asymptotically normally distributed, and 
he was thus able to construct a likelihood 
ratio test for the goodness of fit of observa- 
tional data. A feature of Bartlett’s work is 
that the transition probabilities are sup- 
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posed to be known functions of certain 
parameters (finite in number) which can be 
estimated. The present paper constructs a 
goodness of fit test for the case in which the 
transition probabilities are completely un- 
known. The alternative hypotheses be- 
tween which the test discriminates are (i) 
that a given observation depends only on 
the preceding r observations, i.e., the proc- 
ess is r-dependent; (ii) that the process is 
r—l-dependent. Watter L. SmirH, Uni- 
versity of North Carolina. 


J. H. Gaddum, “Bioassays and mathe- 
matics,” Pharmacological Reviews, 5 (1953), 
87-134. 


This is a brief but surprisingly complete 
review of the various notions and methods 
peculiar to the statistics of bioassay. An 
excellent bibliography is included. L. E. 
Mosss, Stanford University. 


Fisher, Sir Ronald, “The analysis of vari- 
ance with various binomial transforma- 
tions,” with discussion by M. 8. Bartlett, 
F. J. Anscombe, W. G. Cochran, and 
Joseph Berkson, Biometrics, 10 (1954), 130— 
51. 


In many experimental setups leading to 
an all-or-one response by the individual ex- 
perimental unit, it is desirable to make 
inferences about some variate y, a function 
of the binomial parameter, P, associated 
with dose z. The probit, logit, angular 
transform, and others, are examples of such 
variates. 

In each case estimation by maximum 
likelihood calls for an iterative procedure 
involving “working values” of the trans- 
form, and weighting coefficients, both of 
which change from cycle to cycle. 

The angular transformation and the 
square root transformation (for Poisson 
data), a limiting case of the former, have 
nearly constant variance independent of 
the parameter. For both, the amount of in- 
formation per observation is exactly con- 
stant. This property rather than approxi- 
mate variance stabilization is offered as the 
actual justification for adopting the trans- 
formation. Approximate methods such as a 
final (non-iterative) analysis in terms of the 
empirical transforms are rejected as need- 
lessly sacrificing an exact solution. 

In the discussion following the paper the 
point is often raised that most frequently 
there is present more variation than any of 
these models would imply. The doubt thus 
cast on the precise applicability of the 
model reduces the desirability of trying to 
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attain any “exact” solution. L. E. Moses, 
Stanford University. 


Green, J. R., “A confidence interval {o 
variance components,” Annals of Math. 
matical Statistics, 25 (1954), 671-85. 


The application of variance components 
in the interpretation of significance tests, 
the selection of efficient sampling designs 
and in other problems in genetics makes jt 
worthwhile to have knowledge of the reli. 
ability of the estimates. 

Estimates for the variances of the vari. 
ance component estimates are less reliable 
and less informative than confidence inter. 
vals. 

The author stated his problem as follows: 
Given two statistics 4; & ue which are inde 
pendently distributed as o1?x?/r; and 
o2*x?/r2 with mr, & re degrees of freedom re. 
spectively. We want to find confidence 
limits for o12—02?(o, & o2 are unknowns), 
Four solutions were obtained. The first was 
by a method similar to that used by Welch 
and Aspin on the problem of comparing 
two means and involves neglecting succes- 
sively higher powers of the reciprocal of one 
of the degrees of freedom. 

The second and third solutions were ob- 
tained by methods involving the neglect of 
successively increasing and decreasing 
powers, respectively, of a nuisance pa- 
rameter. The fourth solution, a more ac- 
curate one, was obtained by combining re- 
sults in the second and third solutions. The 
avcuracy of the different solutions was also 
discussed. 

To obtain a confidence limit in practice, 
the author discussed a tabulation which, 
as stated, is very laborious and has not been 
constructed. A. E. Sarwan, University of 
North Carolina. 


Kamat, A. R., “Distribution theory of two 
estimates for standard deviation based on 
second variate differences,” Biometrika, 41 
(1954), 1-11. 


This paper deals with the approximate 
distributions of the mean square and the 
mean successive second differences, 63 and 
dz, respectively, where (n—2)53=2D(A%x)! 
and (n—2)d,.=2|A%z;| for successive sam- 
ples of n. It is assumed that normal devia- 
tions with zero mean and variance o* are 
superposed upon a quadratic trend; 52/+/6 
and dz\/7/12 are then used as estimators of 
o. Values of the standard deviation, (; and 
Be for the distributions of 63 and dz are 
computed for n= 5, 7, 10 (5) 30, 40, 50. 

Exact distributions are given for n=4; 
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also the general characteristic equation of 
the matrix of the quadratic form for 
(n—2)83. R. L. ANpeRson, North Carolina 
State College. 


Kendall, M. G., “Two problems in sets of 
measurements,” Biometrika, 41 (1954), 560. 


If n random samples are drawn from a 
normal population of zero mean and unit 
variance, what is the variance of the sample 
of smallest absolute magnitude (i.e., near- 
est the mean)? The author calls this the 
“Angel Problem” and provides the exact 
answer for samples of size 2, 3, 4, 5, and 
asymptotic formulas for higher n. The re- 
sults of the computations are tabulated 
alongside results of sampling experiments 
by W. J. Youden. The author then con- 
siders a related “Demon Problem.” WALTER 
L. Smirn, University of North Carolina. 


Mayne, Alan J., “Some further results in 
the theory of pedestrians and road traffic,” 
Biometrika, 41 (1954), 375. 


Several statistical problems are tackled, 
concerning the passage of pedestrians across 
a road, on the assumptions that the time 
intervals between the arrival of pedestrians 
have one given distribution, that the time 
intervals between the arrivals of vehicles 
have another given distribution, and finally 
that all these time intervals are statistically 
independent. In particular, the size of 
g.oups of pedestrians on an island between 
two lanes of traffic is discussed, and the in- 
crease in “efficiency” which can be obtained 
by the introduction of extra islands is calcu- 
lated. Water L. Smrru, University of 
North Carolina. 


Patankar, V. N., “The goodness of fit of 
frequency distributions obtained from 
stochastic processes,” Biometrika, 41 (1954), 
450. 


When the standard formula for the x? 
statistic is used for grouped data certain 
independence conditions have to be fulfilled 
before we can resort to the usual tables to 
obtain a test of goodness of fit. When these 
independence conditions are not satisfied, 
as is the case with data arising from stochas- 
tic processes, we can either construct a more 
complicated test of goodness of fit, which 
takes account of the altered conditions, or 
we can study the distribution of the familiar 
x? statistic when the usual independence 
conditions are not satisfied. Here the latter 
alternative is adopted and the effect of the 
serial dependence of observations on the 
classical x? test is examined in some detail. 
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On the basis of certain normality assump- 
tions approximate distributions of x? are ob- 
tained by the method of fitting moments. 
The results are applied to two important 
special processes. WALTER L. Smiru, Uni- 
versity of North Carolina. 


Roy, S. N., “Some further results in simul- 
taneous confidence interval estimation,” 
Annals of Mathematical Statistics, 25 (1954), 
752-61. 


In continuation of the author’s previous 
work along the same lines, confidence 
bounds are given on (i) the set of all char- 
acteristic roots of the dispersion matrix of a 
p-variate normal population (ii) the set of 
all characteristic roots of 2,227! where 
2, & L; stand for the dispersion matrices 
of two p-variate normal populstions (iii) all 
bilinear functions a’D,222.7%, where 2i2 
stands for the covariance matrix between 
a p-set and a q-set and 222 for the dispersion 
matrix of a g-set in a (p+q) variate normal 
population (pS$q) and a’ is any arbitrary 
p-dimensional unit length row vector and b 
any arbitrary q-dimensional unit length 
column vector. In each case, the confidence 
bounds are given in terms of the observa- 
tions and contain constants, with a joint 
confidence coefficient greater than or equal 
to a preassigned level. The author con- 
sidered some special interesting univariate 
and bivariate cases by putting p=1 in case 
(i) and p=q=1 in case (ii). He also showed 
that in case (i) “all characteristic roots of 
>” can be replaced by “all a’2a” and in case 
(ii) “all characteristic roots of 2;2:-!” can 
be replaced by “all a’D,a/a’Dea.” A. E. 
Sarwan, University of North Carolina. 


Somerville, Paul N., “Some problems of 
optimum sampling,” Biometrika, 41 (1954), 
420. 


This paper discusses the problem of de- 
ciding which of a given set of populations 
has the greater mean value, and considers 
the best balance between the opposing 
desiderata of economical sample size and of 
accuracy sufficient for the experimenter’s 
ultimate purpose. Certain general formulas 
are proposed to represent the cost of sam- 
pling and the loss incurred as a result of each 
of the various mistaken judgements which 
the experimenter could make. On the basis 
of this formulation, a theorem is proved 
which enables the experimenter to choose a 
sample size yielding the minimum expected 
loss. There is an application of the general 
results to the case where the sample means 
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are normally distributed. Watter L. 
Situ, University of North Carolina. 


Whittle, P., “On stationary processes in the 
plane,” Biometrika, 41 (1954), 434. 


The sampling theory of stationary proc- 
esses in space is not completely analogous 
to the well-established theory of stationary 
time series, due to the fact that the variate 
of a time series is influenced only by past 
values, while for a spatial process the de- 
pendence extends in all directions. This 
paper explores the consequences of this in- 
teresting fact and develops test and estima- 
tion procedures. The theory is applied to 
uniformity data for wheat and oranges. 
Water L. Smitru, University of North 
Carolina. 


Vaart, H. R., van der, “Some remarks on the 
power function of Wilcoxon’s test for the 
problem of two samples. I and II.” Pro- 
ceedings Koninklijke Nederlandse Akademie 
van Wetenschappen, 53 (1950), 494-520, 
also Indagationes Mathematicae, 12 (1950), 
146-172. 


Vaart, H. R., van der, “An investigation on 
the power function of Wilcoxon’s two sam- 
ple test if the underlying distributions are 
not normal,” Proceedings Koninklijke Ne- 
derlandse Akademie van Wetenschappen, 
Series A, 56, No. 5 (1953), 438-48, also 
Indagationes Mathematicae, 15, No. 5 (1953) 
438-48. 


The problem considered by these two 
papers is that of testing that two popula- 
tions, from which samples are drawn, are 
the same, against the alternative that they 
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are the same except for a translation. The 
alternative may be either one-sided or two- 
sided. The tests considered are the one- and 
two-sided Wilcoxon (Mann-Whitney) tests, 
The concepts of power function and un- 
biasedness are discussed in general, to- 
gether with the relationship between the 
derivatives of the power function and bias, 

In the two parts of the first paper the 
author derives general expressions for the 
power function of the Wilcoxon tests and 
their first two derivatives. Numerical values 
of the first two derivatives at the null hy- 
pothesis of the Wilcoxon tests (for very 
small sample sizes) are compared with those 
of the two-sample ¢ test when the distribu- 
tions are normal. These comparisons suggest 
that the Wilcoxon tests are only slightly less 
powerful than the ¢ test for normal distribu- 
tions, very small sample sizes, and small 
translations. The extent to which these 
comparisons carry over to moderate sample 
sizes or to larger translations is not dis- 
cussed. It is shown that the two-sided Wil- 
coxon test may be biased. 

The second paper discusses the question 
of bias in more detail. It is shown that the 
two-sided Wilcoxon test will be unbiased if 
the two samples are of the same size. In the 
case of unequal sample sizes a sufficient, but 
not necessary, condition is found for the 
two-sided Wilcoxon test to be biased. This 
condition is a two-fold one, one part being 
on the distribution functions, the other on 
the two sample sizes and the significance 
level of the test. Two tables are given so 
that the sign of the bias can be determined 
for certain significance levels when the 
sample sizes are 10 and 5, or 8 and 7. Joun 
P. GitBert, University of Chicago. 
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Statistical Theory of Extreme Values and Some Practical Applications. E. J. Gum- 
hel. A Series of Lectures, National Bureau of Standards, Applied Mathematics 
Series 33 (Washington, D.C.: U. S. Government Printing Office, 1954). Pp. 
viii, 51. $0.40. Paper. See review article by Bradford F. Kimball, on pages 517-528. 


Introduction to Mathematical Statistics. Second edition. Paul G. Hoel. New 
York: John Wiley and Sons, 1954. Pp. vii, 331. $5.00. 


Rosert M. Kozexxa, University of Nebraska 


AvinG been born and raised on the first edition of this book (studied 

from, assisted with, and taught from—at three different institutions), 
the reviewer approaches the second edition with a divided mind. The new, 
more mathematical, chapters (2 and 3), which cover probability and the 
nature of statistical methods, are an important addition to the text. Further- 
more, there is a stronger flavor of mathematics throughout, with alternative 
hypotheses and underlying sample spaces spelled out in each application. 
Another new chapter (10) on estimation and testing hypotheses, reénforces 
these ideas. On the whole, the reviewer would feel much happier about his 
beginning statistical education if these features had been incorporated in the 
edition he studied from. 

The other side of the coin, however, was the feeling of having been mathe- 
matically short-changed upon finishing the book. The chapters designed to 
stress the applications of statistical ideas—the bulk of the book, and for the 
most part unchanged—have not made use of the mathematical approach 
much more than in the first edition. A new chapter (12), on design of experi- 
ments, left the same impression. Furthermore, a number of minor faults in 
the first edition have not been corrected. 

One error in the preface (does anyone ever read the preface?) to this edi- 
tion is the suggestion that those interested only in applications can omit 
Chapter 11 (Small sample distributions). Surely Chapter 10 is meant. 

In a book of mathematics, it seems one should avoid empirical conclusions 
where possible. In particular, in the normal approximation to the binomial 
distribution, Hoel still writes “Experience shows that the approximation is 
fairly good as long as np >5 when p S},....” A more explicit statement 
would be that the mean is to be np//npq units from zero and hence (np/ 
V/npq) >3 implies np >9—9p. (Page 87). 

His approach to some of the mathematical details, if not inexact, is at 
least confusing to this reader. On page 15 he states, “...the random 
variable z is [now] just an ordinary variable of mathematics”, and on page 
39, “... the likelihood function [of observed values] gives the probability 
...”, which may lead an unsophisticated reader to wonder why this termi- 
nology was introduced in the first place. Further to bewilder the unsophisti- 
cated, he still does not give the normalizing constants of the F and ¢ fre- 
quency functions. One could as justifiably omit 1//2z in discussing the nor- 
mal distribution, but one is hardly left with a text on mathematical statistics. 
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Probably the worst fault of all, from a pedagogical point of view, is the state- 
ment on page 11: “This result shows what is intuitively clear, ... .” Is any- 
thing in any text intuitively clear to the average student? If it is, it is prob- 
ably clear for an incorrect reason. 

Lest one think that the book is nothing but a compounding of this type 
of thing, let the reviewer register some favorable impressions. The book 
dispenses a large amount of practical application with its theory, including 
many examples and exercises. For those of us unlucky (?) enough to have 
classes composed mainly of non-theoreticians, this is essential. Also, most of 
the objections of J. H. Curtiss, who reviewed the first edition in this Journal 
(June 1947), have been overcome. He stated, in part, “Any really serious 
adverse criticism would center about the treatment of regression, the omis- 
sion of point estimation theory, and the handling of some of the proofs.” 
The proofs remain simplified and non-rigorous, but the addition of Chapter 
10 and the rather complete revision of Chapter 7 (Regression and correlation) 
should be satisfactory. 

The first edition of the book, was, in Curtiss’ opinion, “In view of its gener- 
al excellence, . . . the most important happening in the field of undergradu- 
ate statistical texts in recent years.” This reviewer agrees, and concludes that 
the second edition, which he awaited eagerly, is even better. As a mathe- 
matician, however, he will continue to interpolate additional mathematics 
into lectures from this book, and await, eagerly, a third edition. 


A Primer of Statistics for Political Scientists. V.O. Key, Jr. New York: Thomas 
Y. Crowell Company, 1954. Pp. x, 209. $2.50. 


Grorce W. SnepEcor, Jowa State College 


HIS book is written for “ . . . the student who is unaware of the difference 

between a square root and a standard deviation...” (whatever that 
may mean). Five of the six chapters are devoted to “descriptive statistics”; 
frequency distributions, time series and interrelationships of time series, 
simple correlation and multiple relationships. The sixth chapter is on “infer- 
ences from quantitative data.” The illustrations are drawn almost entirely 
from the subject matter of political science. 

In the first 5 chapters the author rigidly restricts himself to descriptions 
of populations. His readers will not be oversold on the usefulness of this 
approach because there are abundant warnings of limitations and pitfalls. 
The frequency distribution is the only device that gets unqualified approval. 
Measures of central tendency are of “extremely limited descriptive utility.” 
Time series covering only 50 years are so short as to make it “most hazardous 
to develop a general cyclical theory.” Of two series moving parallel, “Care 
is required in the interpretation of the relations shown to exist by even 
such simple methods.” Concerning regression, “All the line does is to define 
the past direction; it does not probe into the future.” In general, “statistical 
procedures will not do one’s thinking for him,” and “until one checks on the 
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ground he does not know that even the most plausible explanation drawn 
from the figures alone has a wisp of a foundation.” In these chapters, while 
the author pays lip service to the effectiveness of the methods described, his 
constant warnings against fallacies and the paucity of constructive conclu- 
sions demonstrate the limited utility of his concept of descriptive statistics. 

Teachers will be interested in the author’s avoidance of numbers and 
calculations. Graphs and verbal discussions constitute the bulk of the text. By 
this means the “ubiquity of the particular” is minimized. I assume that this 
is partly because the audience is supposed to be unskilled in quantitative 
thinking. But I wonder if this has not been carried too far. Computation is a 
discipline which can ill be dispensed with. Perhaps the most effective method 
of teaching lies between the two extremes. 

The last chapter is devoted to “Inferences from Quantitative Data.” This 
is largely discursive, “without resort to a statistical formula.” The ¢, F, 
and chi-square distributions are not mentioned. In this discussion the author 
is not so happy as he was in his descriptions of populations. Among 12 in- 
ferential statements which I listed only 2 were correct. This does not surprise 
me; if I were to undertake an exposition of elementary political science my 
blunders would doubtless be far more numerous. The sad thing is that these 
30 pages are largely wasted. They might have been used to give adequate 
and usable methods for estimating and testing. 

In my opinion the author’s distinction between descriptive and inferential 
statistics is irrelevant. The only object of description is to facilitate inference. 
The purpose of examining data is to estimate probabilities that enter into the 
making of decisions. In practically all statistical investigations the inferences 
are extended beyond the data in hand, otherwise one would be wasting his 
time sorting dead bones. The pertinent distinction is one that the author 
flirts with in chapter 6, the randomness or non-randomness of samples. From 
random samples the inferences involve theoretically exact probabilities. 
Inferences from non-random samples imply faith in persistence or continuity, 
with probabilities that cannot easily be evaluated. Random sampling is the 
appropriate subject matter of elementary statistics. The graver problems of 
non-random sampling should be left to professionals in the specialized disci- 
plines involved. 

I respect the self-control that enabled the author to refrain from any sam- 
ple-to-population inference in his first five chapters but I think this restraint 
removed these chapters from the domain of statistics. Mere arithmetical 
and graphical descriptions of data have no more than historical value. From 
the reviewer’s standpoint, the chief interest in this text is not the statistics, 
which is of doubtful value, but the sage, humorous, and critical attitude of a 
political scientist toward the fascinating problems which he presents. 

P.S. 1am sorry I couldn’t manage a judicious use of the word parameters 
because the author says that this is “a term which, judiciously used, creates 
an appearance of erudition.” 
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Design and Analysis of Industrial Experiments. Edited by Owen L. Davies, 
Authors: G. E. P. Box, L. R. Connor, W. R. Cousins, O. L. Davies, F. R. Hims- 
worth, and G. P. Sillitto. New York: Hafner Publishing Company, 1954. Pp. 
xiii, 636. $10.00. 


Haroup A. Freeman, Massachusetts Institute of Technology 


FF years we waited for a good elementary textbook in mathematical 
statistics; along came, in rapid succession, Hoel, Weatherburn, and Mood. 
For years the field of sampling surveys had practically no book literature; 
now we have Deming, Hansen-Hurwitz-Madow, Cochran, Sukhatme, and 
Yates. The book situation was equally barren in experimental statistics, 
particularly as applied to batch (chemical) processes. Now, following 
Brownlee and Youden, we have Gore and, within weeks of each other, two 
comprehensive volumes, one by Bennett and Franklin (to be reviewed in 
this Journal) and the volume discussed here. 

Davies’ is a statistical handbook for experimenters and a very fine hand- 
book it is. The applications lie almost without exception in the general area 
of chemical processes but one does not need much experience or imagination 
to see that problems from many other areas can be readily described by the 
models given in the present volume. These models arise from the practical 
problems considered in the text—many of them encountered by the authors 
in their own researches—and are naturally specialized to these problems. 
But the reader with modest training in mathematical statistics will often 
be able to modify them to his needs. 

The technique of this volume is literary, with formulas and examples. 
There is some proving, most of it confined to appendices and particularly 
careful attention is paid to the assumptions underlying the models. The 
exposition is clear and accurate, and many topics of importance to experi- 
menters—and often neglected in the textbooks—are found here. These in- 
clude, for examples, lack of independence of errors, self-containment in 
experiments, sample sizes as functions of sampling and test costs, estimation 
of variances rather than the more usual formal analysis of variance, effect of 
departure from normality, missing values, Latin cubes, combining interac- 
tion mean squares. 

The body of the book is naturally given to experimental designs of the 
Fisherian sort, all centering on variance analysis, for little else seems to 
have made any mark in experimental statistics. The topics include simple 
comparisons, randomized blocks, Latin squares, incomplete blocks, factorial 
designs, confounding, and fractional factorial designs. Much of the content 
can probably be found elsewhere but it is good to find it in one place in such 
excellent form. There is also a fine chapter on sequential tests, including the 
details of (Barnard’s) sequential t. The most interesting chapter, to this 
reviewer, was 11, on the Box-Wilson technique—designs for multi-factor 
experiments which aim at the optimal value of a characteristic; for example, 
the levels of the ingredients which lead to the maximum strength of an 
alloy. So far as I know, this is the first detailed account in book form of this 
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most interesting work. The numerical details are involved but as the method 
contributes powerfully to the solution of a problem often encountered by 
experimenters, it has a certain future and many experimenters will want to 
master it. They will find a thorough account of it here. 

This will be a book well worth keeping up to date, and it is the authors’ 
intention to do that. 


Statistics for Technologists. C. G. Paradine and B. H. P. Rivett. New York: 
D. Van Nostrand Company, Inc., 1954. Pp. vii, 288. $6.75. 


GERALD J. LieBerMAN, Stanford University 


HE widespread recognition of the applicability of statistics to problems in 
cebu and the physical sciences has demonstrated the need for edu- 
eating the technologists in the field of statistics. As a result, numerous books 
in the field will be published, Statistics for Technologists being one of the first 
few. 

The content of the book is typical of standard texts for a first course in 
statistics. The preliminary chapters deal with frequency distributions, 
probability, and probability distributions. Tests of significance are presented 
for various hypotheses about parameters of these probability distributions. 
The book also contains brief chapters on quality control and sampling in- 
spection. The theory of errors and the method of least squares are presented 
together with illustrations of their use. The remaining chapters deal with 
correlation, analysis of variance, probit analysis, and the principle of max!- 
mum likelihood. 

The authors are to be commended on their point of view regarding the 
level of mathematics used throughout the text. Their attitude is best ex- 
pressed in their own words: 

“During the last decade or two there has been a considerable raising of the 
standard of mathematics required of candidates for degrees or higher national 
certificates in science and engineering. The students are used to a mathemati- 
cal presentation of the theory of their own subject. In the past many books 
on statistical methods have given formulas and rules of procedure without 
going into the underlying mathematical] theory. In this book we try to steer 
a middle course, giving the mathematical derivation of most of the results 
required, with an introduction to their applications, in a sequence which it is 
hoped will appear logical and carry conviction. Although there can be no 
objection to taking the mathematics for granted if the operation of statistical 
methods is properly understood, the fact that statistical theory is still ad- 
vancing makes it the more desirable that the basis of standard procedures 
should be revealed. The student who has followed the arguments used in 
establishing a sampling distribution, for instance, is more likely to be aware 
of assumptions made in applying it and better equipped to read further in 
the subject.” 

The book contains many interesting examples and problems. Some of these 
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are taken from examination papers of the Senate of London University, the 
Royal Statistical Society, and the Association of Incorporated Statisticians, 

Statistics for Technologists is supposed to be an introduction for students, 
research workers, and engineers to the principles of statistical methods and 
theory. These people, however, will find it lacking in many respects. There is 
almost no discussion of how to design an experiment. Tests of significance are 
treated only with respect to the error of type 1. The error of type 2 is never 
mentioned. This is paradoxical since in the chapter on quality control, the 
authors do devote space to determining the sample size necessary to detect 
a shift in the process average with a given probability. 

The chapter on sampling inspection is meager, and contains very little in- 
formation about such important topics as sampling inspection by variables, 
This may not be very serious except that the references to other sources are 
very poor. No mention is made of such works as Sampling Inspection by 
Variables by A. H. Bowker and H. P. Goode and Statistical Quality Control 
by E. L. Grant. This is not only characteristic of this chapter but of almost 
all the chapters. It appears that the authors are not aware of many of these 
references. They state in the preface, for example that “no originality can be 
claimed for the methods adopted, but it is thought that the use of the 
x? table for the determination of a single sampling inspection scheme (page 
144) may not have been previously noted.” This result has appeared at 
least twice in the literature.! 

The book includes the topic of point estimation but hardly discusses the 
very important concept of confidence interval estimation. To the scientist 
this is an absolute must. Naturally this precludes the introduction of the new 
work on comparing means in the analysis of variance presented recently in the 
literature by Tukey and Scheffé. 

After reading the book, one gets the impression that this is a book which will 
help many a scientist, even though it lacks many of the qualities that are 
necessary to make it an “extra” good book. It presents the standard topics 
that are included in every text. However, it lacks many topics which are 
included in most texts, topics which are necessary to answer important 
questions that the scientist will ask. 


Quality Control: A Manual of Quality Control Procedure Based upon Scientific 
Principies and Simplified for Practical Application in Various Types of Manu- 
facturing Plants. Second Edition. Norbert L. Enrick. New York: Industrial Press, 
1954. Pp. viii, 181. $4.00. (Brighton, England: Machinery Publishing Co., Ltd.) 


GEOFFREY GreEGorRY, Stanford University 


N THIS, the second edition of the book, Enrick has added five chapters 
to the original work, with the idea of introducing the reader to some more 
specialized techniques of the subject. Essentially Part I is the same as the 





1 Paul Peach and Sabastian B. Littauer, “A note on sampling inspection,” Annals of Mathematical 
Statistics, 17 (1946), 81-4; Joseph Cameron “Tables for constructing and for computing the operating 
characteristics of single sampling plans,” Industrial Quality Control, 9 (1952), 37-9. 
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first edition of the book, which has been reviewed by John H. Curtiss in this 
Journal.’ Additional explanatory passages have been inserted, however, and 
indications have been made of further developments of topics in the second 
part of the book. In particular, the author elaborates on the sampling risks 
of his sequential sampling plans, but it is perhaps a pity that some light 
could not have been shed over his use of the term “allowable per cent de- 
fective,” & major consideration in selecting a sampling plan. This additional 
material is well worth studying, however, if only as an indication of the care 
which should be exercised when a particular plan is selected. Also, consider- 
ing the wide use made at present of the Military Standards and the Dodge- 
Romig tables, it would have been of great assistance to the aspiring quality 
control engineer if at least some introduction to the terms used in selecting 
these plans could have been made. 

Continuing in his pleasantly easy style, in Part II of the book Enrick 
launches out into some of the more sophisticated techniques of the subject. 
His decision in the foreword to the first edition to design the book for practical 
men in inspection, and so eliminate any hint of “higher mathematics” here 
leads him into some difficulty. Such examples as testing for normality or the 
comparison of variations are described in principle but left without any 
precise procedure for making a decision. A similar observation may be made 
of the treatment of the analysis of variance. It would be impossible, of 
course, to describe the full implications of the analysis of variance in a book 
of this type. He does succeed, however, in giving some indication of the possi- 
bilities of these tests, and no doubt the inspector, new to these tecnhiques, 
will find it a very useful introduction into their scope. As the author points 
out, some further instruction or assistance is necessary before application 
is made. 

The remaining added material is devoted mainly to a description of the 
conventional Shewhart control charts which had been curiously omitted from 
the first edition. A question might be raised about the significance level of 
different charts, since we are apparently using 20 limits for the means 
charts and 3o limits for the range charts. The standard procedure is used for 
the treatment of the attributes control charts. 

In its present form the book sets down in a simple manner the basic tech- 
niques of statistical quality control. It will be of interest to the non-technical 
man who wishes to obtain some insight into the possibilities and procedures 
of the subject. Throughout the book the theory is illustrated with examples 
of admirable clarity. To understand the techniques well enough to apply 
them, however, rather more than this is needed. Quality control engineering 
is a dangerous occupation for the amateur. As a textbook, then, this treatise 
can be considered as an introduction only to the recognized standard texts 
on the subject. As such it is well written and to the man who entertains 
doubts about the usefulness of the subject it has the virtue of being written 
by @ man with considerable practical experience. 





1 Volume 44 (1949), 139-141. 
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Definitions and Symbols for Control Charts. New York: American Society for 
Quality Control, 1951. Pp. 8. $0.50. 


Epwin G. Otps, Carnegie Institute of Technology 


HE present document represents the work of a Committee of the American 
Society for Quality Control. It has been approved as a standard by that 
organization. 

Arranged in matrix form, for ready reference, this standard gives the terms, 
symbols and definitions for use in connection with control charts. The first 
section has fifteen general terms relating to control charts, the second section 
has twelve terms relating to control charts for variables, and the third 
section has terms relating to control charts for attributes. 

The Committee is to be commended for bringing together many of the 
principal ingredients of control-chart communication. With some general 
agreement on meaning and symbolism, mutual understanding should be 
improved and work in the field should benefit. 

Writing definitions is a tricky business because words must be defined in 
terms of other words which then may need further definition. It is difficult 
to decide where to start and when to stop. In the present instance, it is 
surprising that the committee stopped before defining control, or quality con- 
trol, or statistical quality control. Perhaps some definition of control is implied 
in the statement, “Assignable causes must be identified and removed to at- 
tain statistical control.” However, a much more direct attack on the difficult 
problem seems to be required. Other surprising omissions are chance cause, 
statistic, parameter, modified control limits, variance, expected value, and 
random sample. 

Most of the definitions given differ little from those to be found in quality 
control publications. In many cases the definition includes an example or a 
statement of use. Some readers may wish to object to the definition of 
“average” as the arithmetic mean and may find other definitions which 
could be made more precise. In general, however, the definitions given can be 
expected to convey correct notions, with the possible exception of the dual 
formulas for the standard deviation of p, pn, u, or c. 

When proposing a standard set of symbols for use in a particular field 
there seem to be at least three questions which might be raised: 

(1) Are the proposed symbols those which have already been widely used? 

(2) Will adoption of the proposed symbolism promote better communica- 
tion with workers in neighboring fields? 

(3) Is the pattern of symbolism such as to allow easy extension as the work 
in the field deepens and broadens? 

In answer to the first question, it might be noted that the present standard 
retains, with little change, the set of symbols used in the A.S.T.M. manuals, 
On Presentation of Data, and On Quality Control of Materials. The prime nota- 
tion is “used to designate the true or objective value.” This is the same con- 
vention used in the American Standards Association’s publication Control 
Chart Method for Controlling Quality During Production, one of the most 
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widely-used guides for the application of the control-chart procedure. Sigma 
is used, as in the above publication, to denote the sample standard deviation. 
However, a footnote kindly authorizes two departures from standard: (a) 
the use of the single prime notation may be restricted to universe values and 
a double-prime notion used for standard values and (b) Fisher’s s may be 
used for contro! charts for standard deviation. 

While the suggested symbolism bears the stamp of approval for past 
usage, it has created, and promises to continue to create, problems with 
regard to both intercommunication and growth. Many students of statistical 
quality control believe that there is an opportunity to apply a variety of 
statistical methods in the improvement of the manufacturing operation. 
When they move out of the area of simple control charts they are faced with 
the need to learn a new notation. This reviewer has observed the anguished 
struggles with this difficulty and he wishes that it could be avoided. 

Probably the committee gave sober consideration to the difficulty noted 
above but it is not clear that its decision was the best one. Perhaps it would 
have been better to wait and hope that the statistical world would meet in 
a common symbolism which would well serve the needs of all specialists. 
The formal adoption of the proposed symbolism as a standard sets up a 
barrier to this much needed development which may be hard to hurdle. 
However, any adverse criticism should be tempered with the recognition that 
progress toward a unified symbolism in the field of statistics has been pain- 
fully slow. Possibly the ASQC action will revive interest in the project and 
will speed developments. 


Statistical Astronomy. Robert J. Trumpler (Professor of Astronomy, Emeritus, 
University of California) and Harold F. Weaver (Associate Professor of Astron- 
omy, University of California). Berkeley and Los Angeles: Universtiy of Cali- 
fornia Press, 1953. Pp. xiv, 644. $7.50. 


W. Epwarps Demine, New York University 


| aim of the book (not stated) is to explain in the language of the 
astronomer, and to the astronomer, some of the methods of statistics 
and their uses in astronomy. Looking back at history, one might say that the 
aim is to acquaint astronomers with their own methods, because astronomy, 
like other sciences, is statistical in its quantitative problems. Moreover, it 
was astronomers like Gauss, Legendre, Czuber, and Helmert who made 
great advances in statistical theory as a result of recognizing the nature of 
their problems. In the last three decades, however, statistical theorists have 
pushed ahead, so that now it is the astronomers who find themselves in need 
of a compendium of recent statistical theory—a field in which they were at 
one time preeminent. 

The statistical problems in astronomy deal with the estimation of the 
parameters of orbits, with the ratios of weights, with velocities, periodicities, 
chemical compositions, calibration of instruments, the space-distribution of 
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stars, measurement of the intensities of light of various wave lengths, meas. 
urement of other spectral characteristics, problems of curve fitting, inter. 
relationships between the physical characteristics of stars and of galaxies, 
In addition, there are the usual statistical problems that confront the experi- 
mental scientist everywhere in the discovery and removal of systematic 
errors and biases, and the design of experiments so as to extract the greatest 
amount of information from whatever observations it may be possible to 
take. 

All the various statistical techniques are useful in astronomy—distribu- 
tions, theory of sampling, testing of hypotheses, analysis of variance, design 
of experiment, tests for randomness, statistical theory of prediction, of fune- 
tional relationships. 

It is now known to many physical scientists that the theory of statistical 
design and of inference are a vital part of precise experimental work. Even 
the greatest experts in the most refined physical measurements find that 
statistical theory helps them to discover and remove systematic errors and 
biases, and helps them to plan an experiment with the correct amount of 
replication. 

Part I in 8 chapters deals with statistical theory on an intermediate level, 
al] the while with examples in astronomy, and in the vocabulary of the 
astronomer. Part II in 3 chapters deals with the statistical description of the 
galactic system. Part III in 6 chapters deals with stellar motions in the 
vicinity of the sun. Part IV in 3 chapters deals with luminosity. Part V in 
5 chapters deals with the space-distribution of stars. Part VI in 3 chapters 
deals with galactic rotation. 

A special feature of the book, to the statistician, is Chapter 8 in Part I on 
the testing of hypotheses, contributed by Elizabeth Scott. In the other 
chapters of Part I there is considerable emphasis on the transformation of 
variables and to relations between frequency functions. The mature statisti- 
cian and the beginner will both find much of interest here, as well as in the 
whole book. 

The authors may give the impression in the beginning lines of the intro- 
duction that a statistical investigation is concerned with the quantitative 
description of a group of objects or individuals, or that it is to gain informa- 
tion on the distributions and interrelations of certain attributes that char- 
acterize the individuals of the group. They can not be blamed: a prominent 
text-book in statistics written by statisticians uses almost these same words 
to describe the statistical method. Fortunately, by the time the astronomer 
has read to the middle of the first page of the introduction he may perceive 
that the foregoing description of a statistical investigation is happily not so 
after all; that the real aim in modern statistical theory is to discover the causes 
of the distributions and of the interrelationships. Elizabeth Scott in her chap- 
ter is clear: “we should like to make the correct decision as often as possi- 
ble”, and statistical theory helps the astronomer or anyone else to do this. 

Page 237 will bring smiles to a statistician who has engaged in studies of 
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human populations. We read there that the procurement of information from 
every member of a human population is comparatively simple. All that he 
needs is a suitable questionnaire (easily devised, no doubt), and there is 
apparently no trouble in finding all the people, and in persuading them to 
answer. Then begins the work of distributions and interrelations. In contrast, 
the astronomer has difficulty in procuring the information from his stars! 
All of which reminds me of the lawyer who remarked recently that a census 
of employment and of unemployment should be the simplest thing in the 
world! Everybody else’s job is simple. 

The book and 1953, its date, will doubtless be known, and justly, as a great 
achievement in astronomy and in the other physical sciences. It is recom- 
mended to the statistician who wishes to learn his subject from the stand- 
point of the scientist and to see statistical theory applied to the universe of 
stars and of galaxies, from the earth to the outermost reaches of the astrono- 
mer’s vision, 100,000,000 or more light-years away. 


Table of Binomial Coefficients. Prepared for the Mathematical Tables Com- 
mittees of the British Association and the Royal Society, under the editorship of 
J.C. P. Miller. Royal Society Mathematical Tables, Volume 3. Cambridge: 
Published for the Royal Society at the University Press, 1954. Pp. viii, 162. $6.50. 
ry table shows the number of combinations of n things r at a time for 

all values of r and n such that rsn/2<100. This in effect covers all 
values through n=200. For each n, the maximum value is (or values are) 
printed in boldface. The tabulated values are exact, including as many as 
59 significant figures. In addition, the table covers values of r <12 for n 500, 
r<1l for n<1000, r<5 for n<2000, and r<3 for n <5000. Reference is 


given to a table published in 1762 covering r <2 for n $20,000. 
W. A. W. 


Introductory College Mathematics. Adele Leonhardy. New York: John Wiley 
and Sons. Pp. ix, 459. $4.90. 


D. A. Daruine, University of Michigan 


I‘ HER own words the author has designed this book “primarily to meet the 
needs of the college student who does not plan to specialize in mathematics 
or the related sciences.” It is therefore potentially of interest to teachers of 
statistics. The prerequisites are a minimum of one year of high-school algebra 
and one year of plane geometry. The main emphasis is “on what and why 
and not merely how”, and generally speaking Miss Leonhardy strives for a 
broad and mature understanding rather than for the development of manipu- 
lative skills. 

To carry out this program the author treats the historical development of 
mathematics and stresses often the arbitrary nature of its postulates. Thus 
there is a discussion of non-Euclidean geometries, numbers expressed in 
non-denary systems, non-commutative algebras, etc. There is a treatment 
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of elementary set algebra and the related syllogistic system. Operations with 
the class of positive integers are treated axiomatically, as are their extensions 
to negative integers and “fractions.” The first three chapters deal with alge- 
bra through roots and powers. Chapters 4 and 5 are on measurement, com- 
putation, logarithms, ratio, percentage, index numbers. Chapters 6, 7, and 
8 treat variables, functions, functional relationship, curve plotting, simple 
polynomials, roots, nomographs, variation, limits, derivatives (to the power 
function), antiderivatives (of polynomials). Chapters 9 and 10 treat the 
exponential, logarithmic, and periodic functions. 

Of interest to readers of this journal is the penultimate chapter, 11, en- 
titled “Simple Statistical Methods”. The author defines “mathematical (or a 
priori) probability” by the “equally likely” decomposition and distinguishes 
this from “statistical (or a posteriori) probability” based on empirical fre- 
quencies. She says of the latter definition that “a large enough number of 
cases must be observed so that the conclusions can be sound.” There is a 
description of frequency charts, mean, median, mode with a number of illus- 
trations followed by a formula called the “normal frequency curve”. There 
is next a definition of the standard deviation and a paragraph or so discussing 
“correlation”. 

The book closes with a chapter sketching the development of mathematics 
and its possible future. 

Certainly any teacher would give the author his best wishes for the success 
of this book, and would like to believe that the beginning student could gain 
“a rich background in mathematics” from a course based on it. There is a 
great deal of chatty discussion throughout and informal, lengthy motivation. 
There are numerous problems, many of the “thought” type: “What is a 
proposition?”, “What are the materials of mathematics?”, “What are the 
natural numbers”, “Discuss the law of large numbers in probability”, “De- 
fine the indefinite integral of a function in terms of symbols.” All in all, there 
is a great deal of readirig to be done by the student and at the end of each 
chapter there is a supplementary bibliography which in many cases, in the 
reviewer’s opinion, is unrealistic to suggest to a recent high school graduate. 
For example, at the end of the first chapter there are references to Carnap’s 
“Foundations of Logic and Mathematics” and Hilbert’s “Principles of 
Mathematical Logic”. 

It is not difficult to find many technical faults, but it is perhaps not ger- 
mane to do so with a book of this type. One might remark that the author 
really does not have a postulational treatment of the integers, etc., at all 
despite her lengthy and meticulous attempts. It also seems unfortunate 
that she has had to refer in various places for simple proofs to texts in college 
algebra and has developed insufficient machinery, for example, even to 
state the binomial theorem. There are also the usual fuzzy definitions of lim- 
its, derivatives, etc. 

As a preparation for later statistical work by the student, the reviewer 
feels that this book is a step in the right direction in that set theory, prob- 





p0OK REVIEWS 619 


ability, and the basic statistical ideas are introduced to the student early, 
but it is somewhat doubtful that he would not gain as much or more from 
the standard courses, supplemented by these topics. The reviewer personally 
would like to try a course based on a book like this to resolve the doubt. 


Career perspectives in a bureaucratic setting. Dwaine Marvick. Michigan Gov- 
ernmental Studies, No. 27. Ann Arbor: University of Michigan Press, 1954. Pp. 
viii, 150. $2.25. 


DAEL WoLFLE, American Association for the Advancement of Science 


wo hundred and four employees of the Office of Naval Research were 
T studied by means of a two- or three-hour interview and a questionnaire 
in which each employee supplied information concerning his personal history, 
education, career plans, and values. The employees were classified in groups 
of almost equal size as follows: “institutionalists”—civil service or military 
persons who looked forward to careers in government service; “specialists” 
—scientists, engineers, accountants, lawyers, who looked forward to careers 
in their specialties rather than in a particular place of employment; and 
“hybrids”—who were both government service and specialty persons and who 
looked forward to careers combining both interests. 

Various predictions concerning the three groups were made, e.g., that the 
specialists would be more likely than the institutionalists to consider oppor- 
tunity to learn and opportunity to do original work as very important fac- 
tors in their jobs and that the institutionalists would be more likely than the 
specialists to consider security, salary, and organizational prestige as very 
important. Most of the predictions were borne out by the interview and ques- 
tionnaire data. 

Results of the study are given in 33 tables, 12 figures, and too many words. 

The monograph en Is with a brief discussion of implications for manage- 
ment. As an example: since specialists are not very firmly attached to a par- 
ticular place of employment, management could appropriately “bend its 
efforts toward inculcating a greater attachment to the institutional place 
and toward stressing the relative advantages of this agency.” 


* 


Local Community Fact Book for Chicago, 1950. Edited by Philip M. Hauser and 
Evelyn M. Kitagawa. Chicago: Chicago Community Inventory, University of 
Chicago, 1953. Pp. xx, 310. $25.00. 


Sanrorp M. Dornsuscu, University of Washington 


FTER the decennial census of 1930, statistics for the 75 community areas 
of Chicago were presented for the first time in a single volume. The suc- 
cess of the initial venture led to its repetition using data from the 1940 
census, but the second Fact Book did not appear until 1949. The editors of 
this third member of the series deserve credit for its speedy publication. 
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Their haste has not led to any decline in quality, for the data are complete 
and well-organized. 

In addition to published census materials, the Fact Book incorporates the 
results of special tabulations from census tract summary cards and data from 
local agencies. This volume presents information on many characteristics not 
covered in preceding editions; namely, class of worker, major industry group, 
family income, migration status, tuberculosis rate, public assistance rate, 
number of persons in the dwelling unit, type of fuel, presence of television, 
mortgage status of owner-occupied homes, number of married couples not 
living in their own household, and gross land area. A final noteworthy in- 
clusion is material on retail stores and retail trade taken from the 1948 Cen- 
sus of Business. 

A minor deficiency of this new version is the deletion of the sex breakdown 
of material on individual income. Although the new information on family 
income is most welcome, it does not permit examination of the changing 
pattern of income distribution within the female component of the labor 
force. 

The Negro population of Chicago increased 77.2 per cent from 1940 to 
1950, and the changed organization of the Fact Book reflects this rapid rate 
of growth. Color breakdowns of data have been added for the following kinds 
of material: age, marital status, school years completed, type of household, 
employment status, major occupation group, characteristics of dwelling 
units, births, and deaths. 

A major improvement is the addition of a history of each community area. 
The recent developments which are noted illustrate the increasing hetero- 
geneity within community areas. This is the price that must be paid to 
preserve comparability for trend analysis. This reviewer does not agree with 
the editors’ view of most of the community areas as containing persons who 
are aware of common community interests. The conception of the com- 
munity area as a natural area would call for radical redefinition of bounda- 
ries to allow for recent migration. 


Labor Mobility in Six Cities. Gladys L. Palmer. New York: Social Science Re- 
search Council, 1954. Pp. 177. $2.75. 


Davin R. Roserts, Carnegie Institute of Technology 


| pe Mobility in Six Cities is a landmark in the study of mobility. Its 
significance can be fully perceived only in historical perspective. Very 
little was known about the extent, the incidence, or the pattern of worker 
mobility prior to the 1930’s. Economists who might have initiated research 
were uninterested because one of the simplifying assumptions of the accepted 
body of theory was the perfect mobility of all input factors including labor. 
The existence of immobility was recognized but it was treated as a friction 
which would modify theoretic results but would not invalidate them. It 
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was not deemed necessary to engage in empirical investigations of the phe- 
nomenon. Psychology and sociology, being concerned with individual and 
group behavior, were in principle interested in job mobility, but psychologists 
and sociologists were preoccupied with research in other parts of their fields. 
Practical people, such as businessmen and government officials, were not 
concerned with mobility because it did not pose problems for them. During 
the 1920’s some business organizations became interested in reducing labor 
turnover as a means of cutting costs, but because of the narrowness of their 
objective they did not learn much about mobility per se. 

During the 1930’s the practical problem of administering relief high- 
lighted both the need for and the lack of knowledge about labor mobility. 
For example, to what extent can people be expected to leave depressed in- 
dustries, occupations, and areas? Which groups move and which stay? etc. 
During this period and the early 1940’s, a number of mobility studies were 
made; they were all small and sought to illuminate the labor market behavior 
of particular groups of workers. One, for example, studied the mobility of 
weavers in three textile centers; another the mobility of former employees 
of a large textile mill which had failed; still another, the reemployment of 
Philadelphia hosiery workers after the 1933-34 shut downs. Like the early 
experiments in other areas of scientific inquiry these studies provided a little 
factual information but primarily they suggested hypotheses, possible ex- 
planations of who does and who does not change jobs, the areas within which 
changes occur, etc. Substantively, they were not of much general use because 
their coverage was too narrow a base for generalization and their findings 
were frequently inconclusive or inconsistent with those of other studies hav- 
ing as good claims to acceptance. 

This was the state of knowledge prior to the present study. It is a large 
empirical undertaking conducted under the leadership of Gladys L. Palmer 
for the Labor Market Research Committee of the Social Science Research 
Council, drawing upon the facilities of seven university research centers and 
the Bureau of the Census. The basic data were provided by a statistically 
designed sample survey of the populations of six northern, industrial cities: 
Philadelphia, New Haven, Chicago, St. Pau!, San Francisco, and Los Angeles. 
The field work was done by the Census Bureau and the analysis by groups at 
the Universities of California, Chicago, Minnesota, Pennsylvania, Yale 
University, and Massachusetts Institute of Technology. Work histories 
were obtained covering the decade of the 1940’s for individuals 25 or more 
years of age who had worked for a month or more in the year 1950. The 
findings are based upon the analysis of those work histories. They are far 
too numerous and technical for detailed presentation. The major ones fol- 
low: 

1. Mobility is not characteristic of all members of a labor force but is con- 

centrated within certain parts. 

2. Differences in the incidence of mobility among different groups of 

workers and the kinds of job shifts made follow a similar pattern in 
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different cities, regardless of whether a city’s degree of mobility is rela- 
tively high or low. 

. There are differentials in the incidence of mobility at various levels of 
skill, but even highly stable occupational groups have mobile segments, 

. A labor force adapts more readily to changes in the industrial demand 
for labor than to changes in the occupational structure. 

. Persistent intercity contrasts suggest the existence of area differentials 
in mobility. 

. Expanding employment in a city attracts workers from other areas, and 
migrants are relatively flexible in adjusting to labor market changes, 

. Workers who are experienced in certain occupations can transfer their 
skills to certain others, but there is a limit to the amount of inter- 
change between levels of skill. 

. When employment is high, voluntary job changes outnumber in- 
voluntary changes and tend to reflect an improvement in economic 
position and in the knowledge and skills of workers. 


There is a wealth of statistical breakdowns resulting from the cross-classifica- 
tion of the sample by city, age, sex, race, marital status, occupation, industry, 
employer shifts, industry shifts, occupational shifts, migration, etc. 
Rather than attempt a recapitulation, it may be more fruitful to consider 
the two following questions: (1) To what extent did the study succeed in 
determining the extent, incidence, and patterns of mobility within the area 
examined? (2) How far can its findings be generalized? With respect to the 


first question, the results are mixed. Findings about the extent and patterns 
of mobility are conclusive and constitute an important contribution to 
knowledge. Never before has there been such comprehensive information, 
even for a single area, on the amount of jecb movement of its population and 
on the character of the movement, i.e., how it breaks down into employer 
shifts, industry shifts, occupation shifts, geographic shifts and combinations 
of these. 


The findings about incidence are less satisfactory. They establish that a 
large portion of the job shifts are made by a small fraction of the workers—a 
finding which had been foreshadowed by earlier studies. It establishes that 
mobility varies inversely with age and it suggests strongly that the people of 
higher socioeconomic rank are, in general, less mobile than those of lower 
rank. However, the attempt to distinguish the mobile people further, in 
terms of sex, race, industry, etc., yielded disappointing results. The incon- 
clusiveness of the findings shows up in the fact that significant differences in 
mobility rates do not appear when the data are classified according to sex, 
according to race, etc. The design of the study tends to obscure such differ- 
ences to some degree. The people in the sample were classified according to 
their 1950 characteristics and their work histories for the whole preceding 
decade were then imputed to those characteristics. For example, if a man 
married for the first time in 1950 his job history for the preceding decade 
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would be associated with the status “married,” even though nine-tenths of 
it pertained to the status “single.” Another factor of importance is the size 
of the sample used. In order to isolate the effect of any single characteristic, 
e.g., Sex, upon mobility it is necessary to compare the mobility rates of groups 
of people who are similar in all significant respects except sex, e.g., age, 
occupation, industry, ete. Such extensive eross-classification is impossible 
within the limits of statistical reliability unless the sample is much larger 
than that employed in the six-city study. It was necessary, carrying forward 
the foregoing example, to compare mobility rates for men and women with- 
out providing uniformity in respect to age, occupation, etc. To the extent that 
differences in the latter characteristics influence mobility the true effect of 
sex does not show up. The omission from the sample, probably for financial 
reasons, of people who lived in the six cities at some time during the decade 
but moved away before the survey date, biases the results and may be re- 
sponsible for some of the inconclusiveness about incidence, though this fac- 
tor is less important than those mentioned above. 

The question of generalizing the six-city findings is complex. In principle, 
generalization is valid only for areas which are similar in essential respects to 
those used in the original study. A difficulty is that in the present state of 
knowledge it is not at all clear which aspects of an area are essential in 
respect to the mobility of its inhabitants. Only additional studies based on 
areas which differ from the six cities can supply that information. In the 
meanwhile generalization must be based on judgment. Regardless of the 
generality of the six-city results, the study has provided a set of bench- 
marks and some notions about mobility which will be a valuable frame of 
reference for future workers. A less tangible, though potentially significant, 
contribution is the guidance which the study provides for future research 
in the form of now-recognized hypotheses which were not apparent until 
illuminated by the evidence amassed. 

Without question, this if the most important piece of work in the field 
and one whose significance must be measured both in terms of its substantive 
contribution to knowledge and the influence which it will exert on future 
research. 


Industrial Pensions. C. L. Dearing. Washington: The Brookings Institution, 
1954. Pp. x, 310. $3.75. 


H. W. Srernnaus, The Equitable Life Assurance Society 


HEN the United States Supreme Court, on April 25, 1949, upheld the 

Inland Steel decision which made pensions subject to collective bargain- 
ing under the Taft-Hartley Act, economists familiar with pension fund ac- 
cumulations became concerned with the possible consequences if organized 
labor were to succeed in obtaining substantial pension rights which in turn 
would lead to tremendous pension reserves. The Brookings Institution be- 
gan a study of this formidable problem early in 1950 and published its con- 
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clusions in May 1954, some 4 years later. It was an exploratory undertaking 
in a new field, made particularly difficult by the need to evaluate and project 
numerous factors which enter into the creation of private retirement security 
on a national scale. 

These factors involve (1) estimates of the future number of aged, their 
longevity and their private resources, which would influence the trend of 
pension demands; (2) estimates of the number of people in the labor force, 
their earning power and productivity, which would determine the capacity 
to absorb the cost; (3) the relation of savings for retirement to savings gener. 
ally, and the amount of such savings in relation to investment demands; 
and (4) the evaluation of the trend in pension plans, as to inclusiveness, 
degree of vesting, method of financing, ability of unions to promote, ability 
of employers to resist, and government influence through taxation or publicly 
administered schemes. 

Unfortunately, there is no point in discussing the results of any of these 
estimates, since most of the basic premises used for projections have already 
become obsolete. There have been fundamental changes in the Social Security 
Act and the Revenue Code. The savings figures of 1949 and prior years, in 
total as well as by income brackets, which were used for projections have 
proved invalid in the light of more recent experience. Such newer develop- 
ments as investments in equities, portable pensions, and the change in 
union emphasis on the tie-in between private and social pensions, were not 
evaluated. Unwittingly perhaps, but forcefully, the book demonstrates the 
futility of attempting projections in an economy as dynamic as ours. 

Nevertheless, the student of the subject will find much of interest in a 
review of the techniques developed for the various estimates. Economists 
may benefit from the painstaking search into metheds to obtain a rough 
estimate of the future of industrial expansion which in turn influences both 
the supply of funds and the investment potentials. Chapter VII, for instance, 
estimates the annual flow of money into pension funds, by rate, volume, 
and characteristics, by utilizing data supplied by 297 «: ~porations on a special 
questionnaire. Because of great variations, each major industry was analyzed 
separately, and it was concluded that increases in retirement savings will not 
be fully offset by decreases in other forms of savings. 

In Chapter VIII a projection was made of national income changes and of 
personal savings particularly, and then a detailed analysis was undertaken of 
future investment outlets in the form of federal, state, and local government 
debt increases, mortgage potentials, and corporate security issues. This in- 
volves, in the public sector, construction of buildings for public education, 
highways, and urban redevelopments; in the private sector, residences, 
plant expansion, and new industries. It was concluded that some portions of 
of the total supply of new retirement savings might be unable to find ready 
employment. 

Chapter IX appraises the basic conflicts in policy and objectives that have 
characterized the pension development and considers how these conflicts 
may be resolved. The final chapter, X, attempts to resolve the vital issue of 
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allocating responsibility for retirement financing. In spite of all shortcomings, 
these four chapters contain challenging thoughts and open up new avenues 
for appraising the economic effects of retirement security established through 
private means. Above all the study convincingly demonstrates that retire- 
ment security on a national scale affects every phase of our economic life. 


The Share of Financial Intermediaries in National Wealth and National Assets, 
1900-1949. Occasional Paper #42, Studies in Capital Formation and Financing. 
Raymond W. Goldsmith. New York: National Bureau of Economic Research. 
1954. Pp. 120. $1.50. 


GeorGE Garvy, Federal Reserve Bank of New York 


His study summarizes the substantive finding on one aspect of the 
_ ee process. This and other aspects will be considered in full in a forth- 
coming monograph on the growth of financial intermediaries and their role 
in the capital markets. The monograph will be part of the project on capital 
formation and financing sponsored by the Life Insurance Association of 
America. Several Occasional Papers stemming from this same research project 
(perhaps the most important undertaking of the National Bureau of Eco- 
nomic Research since the war) have been published already. 

Goldsmith’s study presents new estimates for selected years of aggregate 
assets held by specific groups of financial intermediaries, the shares of these 
groups of intermediaries in the main types of assets and liabilities, and the 
share of financial intermediaries in national wealth and assets. One chapter 
is devoted to each of these topics. A discussion of the nature and the limita- 
tions of the study and a concise summary of the findings complete the report. 

Goldsmith’s paper is essentially limited to the presentation of new statisti- 
cal material. Some of the new series have been developed in connection with 
the author’s three-volume study of Savings in the United States, now in press.. 
Since fragmentary basic data preclude the preparation of detailed continuous 
annual series for the most important financial intermediaries, Goldsmith 
has chosen to clarify the main movements by providing estimates for eight 
benchmark years between 1900 and 1949. 

The essence of the report is in the 27 tables. Seven charts depict salient 
changes over the fifty years. The text in the main describes changes as they 
unfold between the consecutive years selected; the emphasis is on the what, 
not on the why. A provocative interpretation of the factual material is, 
however, contained in the introduction by Simon Kuznets, who was in charge 
of the over-all project and who endeavors to indicate the place of Gold- 
smith’s findings in the cooperative project of which it forms a part. 

The present study provides for the first time comprehensive statistical 
evaluation of the role of financial intermediaries in external capital formation 
since the turn of the century. The evaluation of the statistical spade work 
underlying the various tables presented in this report must await the publi- 
cation of the full monograph (one can only guess the amount of ingenuity 
and statistical skill that must have been involved in fitting the pieces to- 
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gether); so must our curiosity concerning the author’s analysis and inter. 
pretation of the role of financial intermediaries in the saving process. 

In the meantime, two questions may be raised, one about the concept of 
financial intermediaries and the other about the grouping of the various inter. 
mediaries. The author does not discuss why he has made the concept of finan- 
cial intermediaries so broad as to include social security funds, the Federal 
Reserve System, and the various government lending agencies; he merely 
lists in a footnote the institutions included. The justification of the choice 
made must await publication of the monograph. But the concept used is cer- 
tainly broader than suggested in Kuznets’ introduction, where financial inter- 
mediaries are defined as “institutions engaged in investing funds mobilized 
from a large number of individual and other savers.” Views may differ as to 
whether all institutions holding intangible assets or only those which are 
repositories of savings should be classified as financial intermediaries. And 
surely, the “mobilization of savings” through social security reserve funds 
differs in many important respects (including their availability to meet 
deficits in the private sectors of the economy) from the accumulation of 
assets in savings institutions. 

Fortunately, detail given even in this condensed preliminary report 
makes it possible for the analyst who prefers a narrower definition or different 
groupings of financial intermediaries to derive alternative aggregates and 
percentage distributions by simple arithmetical computations: the building 
blocks are there. To facilitate analysis, the basic data are organized on a 
uniform plan: thus, the same twenty categories are included in the stubs of 
all tables in the chapter on the share of financial intermediaries in various 
types of assets even though the lines for Federal Reserve Banks, the postal 
savings system, and government lending institutions show blanks except in 
the table on holdings of government securities. 

Treating government institutions (including such government lending 
agencies as the Reconstruction Finance Corporation and the Home Owners 
Loan Corporation, which obtained practically all their funds from the Treas- 
ury) as just another type of financial intermediary is, perhaps, the most 
debatable aspect of the study previewed in Goldmsith’s report. The author 
by no means ignores the problein; brief reference is made (p. 102) to a differ- 
ent grouping of financial intermediaries into private and public sectors. 
Nearly the entire growth in the ratio of assets of financial intermediaries 
to national assets (the most comprehensive among the several ratios used to 
measure their relative importance) between 1929 and 1949 reflects the growth 
of public intermediaries. The growth of public intermediaries (from 1 to 5 
per cent) reflects essentially the monetization of public debt by the Federal 
Reserve System and the growth of social security (mainly OASI) reserve 
funds; in the thirties, the substitution of government credit for private 
credit (by acquisition of mortgages, preferred stock and other assets) by 
government lending agencies was another important factor. 
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The present report serves well the double aim pursued by the National 
Bureau of Economic Research in its Occasional Papers: to make available 
to students as promptly as possible new statistical data and the most im- 
portant substantive findings by short-cutting the delays involved in the 
publication of substantial monographs, and to whet the appetite for the 
main course. 


The Volume of Residential Construction, 1889-1950. Technical Paper Number 
9, Studies in Capital Formation and Financing. David M. Blank. New York: 
National Bureau of Economic Research. 1954. Pp. 99. $1.50. 


Ausert H. Scuaar, University of California (Berkeley) 


HE primary purpose of Mr. Blank’s monograph is to describe the methods 
_ ate in a new estimate of residential construction for the years 1889 to 
1920. The estimate was prepared as a part of a larger study on the financing 
and formation of capital in residential construction which is in turn to be a 
part of a National Bureau of Economic Research project on long-term capital 
accumulation and financing in the United States. 

The new construction series are based on hitherto unused data from a 
Works Progress Administration project which gathered information on build- 
ing permits from local records. They are linked with Bureau of Labor 
Statistics figures for 1920-1950 to provide a complete picture from 1889 on. 
The WPA records provided data for 1920-1929 but, after some consideration, 
the current BLS estimates for this period were retained. 

Except for a brief discussion in Section 2, the monograph is devoted to a 
description of the problems encountered in making the new estimates and the 
manner in which they were resolved. On the whole, it appears to be a very 
workmanlike job. Of considerable interest is the comparison of the new series 
with older ones by Chawner, Long, Wickens, and Colean. The monograph 
provides good grounds for the argument that the new estimates are consider- 
ably more reliable. They generally show from 10 per cent to 20 per cent fewer 
private non-farm housekeeping dwelling unit starts than the older series for 
the years 1900-1920 and about 10 per cent more during the decade of the 
1890’s. 

The monograph also presents new estimates of non-housekeeping dwelling 
unit starts, 1891-1914, and of expenditures for both housekeeping and non- 
housekeeping units. A decline of about 40 per cent in the index (in 1929 
prices) of average expenditure per unit between 1890 and 1950 is noted. Al- 
though the brevity of the treatment may have necessitated the absence of 
any qualification for this finding, it seems well to point out that conclusions 
based on such a figure are apt to be highly tenuous. All researchers using 
index numbers are well aware of the complications involved in comparisons 
over long periods of time and certainly a dwelling unit built in 1890 was a 
rather different product from its 1950 counterpart. 
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Econométrie, Colloques Internationaux du Centre National de la Recherche 
Scientifique XL, Paris, Editions du Centre dela Recherche Scientifique, 1953. Pp. v, 
332. 


R. W. Prouts, University of North Carolina 


HIs star-studded, French presentation offers the reader a stimulating 
poo of essays but confronts the reviewer with an extremely difficult 
task, The reviewer’s problems arise from the diversity of the topics covered 
by the essays; yet the essays have a central theme, the foundations and appli- 
cations of the theory of risk in econometrics. This was the theme of a con- 
ference held in Paris in 1952, and the essays were the contributions of the 
participants in the conference. 

The conference brought together a most distinguished group. The Ameri- 
can representatives were (in order of appearance in the volume) Savage, 
Arrow, Friedman, Samuelson and Marschak. The European participants 
(arranged in the same manner) were Guilbaud, de Finetti, Allais, Wold, 
Massé, Morlat, Frisch, Boiteux, Ville, and Van Dantzig. Other distinguished 
participants made brief comments on the major contributions of those listed. 

The pervasiveness of risk in economic life is indicated by the fact that risk 
was considered in connection with such diverse topics as stocks, income dis- 
tribution, organization, electric costs and tariffs, and credit. But in spite of 
the interest and relevance of these topics, the group devoted more attention 
to subjective probability and the theory of cardinal utility in the manner of 
Von Neumann and Morgenstern than to any other single topic. 

A large part of the discussion of utility and probabilities is devoted to 
an examination of the axioms of the Neumann-Morgenstern development of 
utility. Quite understandably references are made frequently to the work of 
Marschak, Samuelson, and Savage that clarified and extended the Neumann- 
Morgenstern postulates that support the thesis of Bernoulli. In cases where 
the attainment of alternative sums of money or alternative bundles of 
goods is not certain but depends on probabilities, Bernoulli believed that one 
should behave in such a way as to obtain a maximum expectation of utility 
from the money or the goods, that one should maximize his “moral expecta- 
tion.” 

A.good deal of the discussion of utility centers on the assumption that 
Samuelson has called the axiom of strong independence. This axiom 
states that if A and B are prospects between which the player is indifferent, 
C is a third prospect and p is a probability, then [pA +(1—p)C] is indifferent 
to [pB+(1—p)C]. From the standpoint of economic theory this statement 
appears to lack generality because of the possibility that A and C, say, are 
connected by a complementarity relation stronger than that connecting B 
and C. But from the viewpoint of probability theory, it is elementary that 
the axiom is acceptable because the prospects involved are mutually exclu- 
sive and the possible complementarity will never in fact be realized. 

Nevertheless the strong independence axiom is, at first acquaintance, likely 
to bring out the schizophrenic in the reader as he views it first from one side 
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and then from the other before finally separating the theory of utility under 
conditions of certainty of obtaining the desired goods from the stochastic 
ease in which probabilities of obtaining the goods must be considered. 
Samuelson shows that strong independence is a necessary assumption for 
the Bernoullian theory. 

The most vocal opponent of the neo-Bernoullians was Allais. He attacked 
the “American School” from so many different directions that it is almost 
impossible to give a brief summary of his objections other than to report 
that his attitude is one of total rejection. But it may be said that Allais evi- 
dently felt that the basic error of his opponents lies in their neglect of the 
dispersion of the psychological values of the various quantities involved. 

Allais had at hand a substitute for the Bernoullian theory. He would resort 
to the Weber-Fechner method of minimum perceptible differences. On this 
basis he obtains a theoretical preference function that has certain desirable 
properties. To the reviewer this seemed to involve the danger of an unneces- 
sary retreat toward old fashioned notions of measurable utility. 

The reader who gets an unreal feeling from discussions of subjective prob- 
ability and from reading about a world in which a consumer apparently must 
weigh a sixty per cent probability of a dozen shirts against a forty per cent 
probability of a boxer puppy in establishing a preference pattern can draw 
comfort from the volume under review. The authors were well aware of the 
unreal aspects of the Bernoullian theory, and, as indicated, there is consider- 
able controversy over the validity of the entire approach. There are many 
refreshing references to “real men” in the volume especially in the discussions 
of the major papers. 

Probably the element of risk is less important in the theory of utility than 
in any of the other topics discussed. If this is true, it seems unfortunate that 
more attention was not given to some of the other economic applications of 
the theory of risk. Nevertheless, some of the other applications are discussed 
and the discussion of utility goes a long way toward clarifying the Bernoullian 
theory. Probably one unconsciously discounts the importance of the dis- 
cussion of utility because the book appeared after the symposium featuring 
Wold, Shackle, Savage, Manne, Charnes, Samuelson, and Malinvaud that 
appeared in Econometrica.’ Actually the Paris Conference preceded the ap- 
pearance of this symposium, and, no doubt, influenced the views expressed 
in the symposium. 

This book probably will not receive as much attention as it deserves from 
English-speaking economists and statisticians because it is written in French. 
This gloomy prospect is offset to some extent by the fact that a good part of 
the material has already been expressed in various articles in English. 

A brief but helpful summary of the proceedings written in English by 
Frisch appears near the end of the volume. Readers who try this sampler 
may find it sufficiently challenging to cause them to refurbish their “reading 
knowledge” of French and try the essays themselves. 





1 Econometrica, 20 (1952), 661-79. 
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The Board is happy to announce that membership rose to a new high in 1954 
and that it was a successful year financially. The ASA’s centinued promotion 
campaigns among other learned societies, nominations from ASA members, and 
the reduction in dues to residents outside North America have brought in more 
new members than in any other year since 1947. As a result, membership dues 
have risen to a new high, and this, combined with increases in other sources of 
income, has had the effect of producing a surplus greater than that originally 
budgeted. Thus another accrual to surplus has been achieved. Details of mem- 
bership and finances will be presented in the report of the Secretary-Treasurer. 


Journal of ASA and the American Statistician 


The Journal of the American Statistical Association has been steadily increasing 
in size for the past five years. This is revealed by a comparison between the 1949 
volume and the 1954 volume. In 1949 the total number of pages of the Journal 
was 590, whereas the total number of pages in the 1954 volume comes to 934, 
a rise of about 60%. The increase in funds for the Journal, as voted by the 
Board, has meant more articles per issue, as well as the publication of abstracts 
of papers presented at the Annual Meetings. The practice of publishing the ab- 
stracts is now in its third year (those from the 1954 meeting appear in this issue). 

The American Statistician has also increased in the average number of pages per 


issue in 1954. This has provided more space for information about Association 
activities, as well as more articles, news of interest to members and other features. 


New Publications 


The new monograph ‘Statistical Problems of the Kinsey Report” was com- 
pleted this year and is now available. This important work may be purchased by 
members at a special price substantially below the price to nonmembers. The 
monograph is the result of a study made at the invitation of the National Re- 
search Council, which is sponsoring the work of Dr. Kinsey and his associates. 
A committee composed of William G. Cochran, Frederick Mosteller and John 
W. Tukey studied the statistical methods in Kinsey’s first volume, and this 
monograph is the report of their findings. 

Also available is the “Proceedings of the Business and Economic Statistics 
Section,” a collection of the papers presented under the Section’s sponsorship at 
the 114th Annual Meeting in Montreal. This volume is being sold at a very low 
price to members. It is hoped that the “Proceedings” volume can be published 
each year but this will depend upon the reception of the first issue. 

The newest edition of the Membership Directory was also published in 1954. 
It contains approximately fourteen per cent more names than the previous edi- 
tion which appeared in 1951 and has a geographic breakdown and a listing of the 
members according to the Section of ASA in which they are interested. The ASA 
Constitution, the charters of the sections, and other information about the As- 
sociation are also printed in the new Directory. 
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Chapters 


During the past year the Board granted charters to two new Chapters of the 
Association—State College, Pennsylvania and the Statistics Section of the Vir- 
ginia Academy of Science. These bring the total number of active chapters to 32. 
Inquiry about forming chapters has come from four more areas and 1955 will 
undoubtedly see a further increase in the number of chapters. 

As a result of the agreement between ASA and the United States Employment 
Service on handling placement for members, each chapter has appointed a person 
to act as liaison between the chapter and the local USES office. This person will 
provide the USES with information and advice on the technical aspects of the 
various fields in which statisticians work, in order to assist the Employment 
Service in the placement of statisticians. 





































Sections and Committees 


A charter has been granted to the Section on Physical and Engineering Sciences 
(formerly the Committee on Statistics in the Physical Sciences), which brings 
to five the number of Sections. All five Sections are very active in the formulation 
and presentation of sessions at the Annual Meetings. This provides in the pro- 
gram a wide variety of topics of interest to members. 

The Committee on Monographs and Occasional Papers was established in 
1954 for the purpose of providing a body to implement and review additions to 
the Association’s publishing program. It is expected that this Committee will 
provide an additional impetus to the ever-widening activities of the Association. 

The Board in 1954 reviewed thoroughly the functions, scope and composition 
of the Advisory Committee to the Bureau of the Budget and made certain 
changes which will enhance the value of the Committee to the Bureau. The 
number of members on the Committee was increased from six to nine with the 
specification that a majority of the members must be Presidents, Presidents- 
elect or Past-Presidents of the Association. The remaining members shall be 
present or past officers of the Association. The Board also gave greater freedcm 
to the Committee in its capacity as an advisory body to the Bureau of the Budget 
and to other governmental departments which may request its assistance. The 
complete report of the Board’s review of this Committee appears in the Minutes 
of the Board of Directors Meeting of May 3, 1954. 



























































Annual Meeting 
















This year, for the first time in many years, the Annual Meeting of the Associa- 
tion was held at a time other than Christmas week. The meeting held in Montreal 
in September 1954 was the first that was ever held outside of the United States. 
The shift from December to September was somewhat of an experiment and was 
made in response to many requests from members that the Annual Meeting be 
held at some time other than Christmas week. The attendance at the Montreal 
meeting was quite good and another September meeting is scheduled for 1956 in 
Detroit. (The 1955 meeting will be held in December in New York City.) A well- 
balanced program was presented at the 1954 meeting under the chairmanship of 
Besse B. Day. A reception given by the City of Montreal for the members was a 
highlight of the Convention. 

The Board has decided to poll the members on their preference for the time of 
the 1957 meeting, which is still open. The questionnaire will be sent early in 1955 
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and the announcement of the time chosen will be made in an early issue of The 
American Statistician. The questionnaire will present three choices: Spring, Fall 


or Christmas week. 
Future Annual Meetings of the Association are planned as follows: 


1955—New York City, December 27-30 

1956—Detroit, Mich., September 7-10, joint with the American Sociological 
Society 

1957—Open 

1958—Chicago, IIl., December 27-30 


Annual Council Meeting 


At its meeting in September 1954, the Board voted to have the annual Council 
Meeting at a separate time from the Annual Meeting of the Association. The 
Council Meeting will take place in Philadelphia on January 7, 1955. For the first 
time the Board has authorized the use of proxies, either in writing or via a 
colleague. One of the reasons for this is to provide Board and Council members 
from the western part of the country with an opportunity to present their views, 
even if they are unable to attend in person. The proxy has all privileges except 
a vote. 

In order to provide greater rapport between the National Association and its 
chapters, as well as providing some measure of recognition for the work of chapter 
officials, the Board has issued an invitation to chapter presidents to attend the 
Council Meeting in Philadelphia. This will provide the chapter with an oppor- 
tunity of bringing to the attention of the Buard and Council specific problems. 
It will also give the chapter presidents an understanding of how the executive 
body of the Association functions. 


Regional Meetings 


Three Regional Meetings were held during the past year. In April 1954 the 
Chicago Chapter, in cooperation with other area chapters, held a Mid-Western 
Regional Conference which was very successful. A Proceedings volume of the 
Conference was published and copies may be purchased from the Chicago 
Chapter. 

The New York Chapter sponsored a two-day meeting in connection with the 
celebration of the 200th Anniversary of Columbia University in May 1954. 

The West Coast Chapter organized a Western Regional Meeting in Berkeley, 
California in December 1954 in conjunction with the American Association for 
the Advancement of Science and other societies. 

Present plans for 1955 call for a conference on Business Statistics to be spon- 
sored by the Business and Economic Statistics Section and the Wharton School 
of the University of Pennsylvania to be held in the summer. The Section on 
Physical and Engineering Sciences will hold a conference in May 1955 in con- 
junction with the centennial celebration of the College of Engineering of New 
York University. 


REPORT OF THE SECRETARY-TREASURER FOR 1954 


The 1954 promotion campaign, which was directed primarily to members of the 
Royal Statistical Society, the International Statistical Institute and the Inter- 
American Statistical Institute, proved very successful. The number of new mem- 
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bers from other countries more than offset the small loss in membership income 
resulting from the reduction of dues to residents outside North America. The 
lowering of dues for persons who have difficulty in obtaining dollar exchange has 
made it much easier for them to obtain the Association’s publications. 

At the beginning 1954, the membership of ASA was 4,900. The number of ney 
members for 1954 is approximately 700; added to this figure are 36 others who 
reinstated their membership. At the end of 1954, about 480 members were 
dropped because of resignation, death, or nonpayment of dues. Thus, the net in- 
crease in membership for 1954 comes to 250, and the Association begins 1955 with 
a total of 5,150 members. Further promotion campaigns in 1955 are expected to 
continue the growth of ASA. 

Subscriptions of libraries, business firms, etc., to the Journal of the American 
Statistical Association have also been increasing. The number for 1954 is 1,437, 
as compared with 1,356 in 1953 and 1,248 in 1952. An increase in subscriptions to 
The American Statistician has also been noted. 

The Secretary’s office is happy to announce that almost all copies of the sym- 
posium, “Acceptance Sampling,” which was published in 1950, have been sold. 
The success of this undertaking, the first monograph put out by the Association 
in twenty years, has made it possible to underwrite other monographs, such as 
the “Statistical Problems of the Kinsey Report” and the “Proceedings of the 
Business and Economic Statistics Section.” 

The Financial Report, which is attached, shows that the budgeted surplus for 
1954 will be exceeded by approximately $1,600. Total income for 1954 was 
budgeted at $54,775, whereas the final amount for the full year will be about 
$59,250. This increase is due primarily to income from dues and from the sale 
of publications, although there were increases in most other items of income, as 
well. Total 1954 expense was budgeted at $52,465; the actual final figure is ex- 
pected to approximate $55,260. The cost of publications and the expense of the 
Annual Meeting account for most of the increase in the total expense, with slight 
increases in a number of other items of expense. 

As a result of the 1954 accrual to surplus, the Association begins 1955 with 
approximately $30,000 in total surplus funds. Proposed income for 1955 is 
budgeted at $65,810, while expense is calculated at $63,270. This leaves about 
$2,500 for addition to surplus at the end of 1955. 

The practice of providing monographs and other nonperiodical publications 
at a reduced price to members has enabled the Association to provide a wide- 
spread distribution at the lowest cost and it is hoped that it will be possible to 
continue this practice. Income from the sale of publications to nonmembers at 
a slightly higher price will provide part of the funds necessary for printing future 
volumes in the expanding publication program. 


March 30, 1954 


To the Board of Directors of 
the American Statistical Association: 


I have examined the accompanying financial statements of the American 
Statistical Association relating to the year ended December 31, 1954. My ex- 
amination was made in accordance with generally accepted auditing standards 
and, accordingly, included such tests of the accounting records and other audit- 
ing procedures as were considered necessary in the circumstances. 
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The recorded cash receipts for the year were traced to the deposits shown on 
the bank statements, and the amounts for dues and subscriptions were tested 
against the membership and subscription records. The paid checks were inspected 
and related vouchers tested in support of cash disbursements for the year. The 
bank balances were reconciled with certificates obtained directly from the de- 
positaries, and the cash on hand was counted and reconciled with the books dur- 
ing the course of the examination. I did not check the membership and subscrip- 
tion records in detail or make any independent verification of the inventory of 
back numbers of Journals, the office records of which are based, in part, on data 
assembled in prior years. 

In accordance with a resolution passed by the Board of Directors, the expense 
incurred in publishing a directory, distribution to the membership beginning in 
1954 is to be spread over a three-year period although such costs would appear to 
be applicable primarily to the year 1954. The accounts for the year ended Decem- 
ber 31, 1954, reflect a charge of $1,597.24, representing the allocated portion of 
the directory expense applicable to that period. 

In my opinion, the accompanying statements present fairly the financial 
position of the American Statistical Association on December 31, 1954, and the 
results of its operations for the year then ended, in accordance with generally 
accepted accounting principles, except as mentioned in the previous paragraph, 
applied on a basis consistent with that of the preceding year. 

James G. JESTER 


AMERICAN STATISTICAL ASSOCIATION 
BALANCE SHEET 
Assets 
December 31, 
1954 1958 


Cash in banks and on hand (see note) $59,741.20 $52,431.05 
Accounts receivable 894.71 2,783.14 
Investment in United States Savings Bonds, 

Series G, due 1962, at cost 3,100.00 3,100.00 
Inventory of old Journals, at approximate cost... . 2,360.99 2,137.69 
Inventory of Monograph on Acceptance Sampling, 

, 45.49 129.24 
Inventory of Emblems, at cost 361.50 415.50 
Inventory of Monograph on Kinsey Report, at cost 4,482.00 
Furniture and fixtures, at cost less accumulated de- 

preciation 1,908.99 2,088.78 
Deferred Charges: 

3,604.48 
1,136.75 945.55 





$77,636.11 $64,030.95 
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Liabilities and Net Worth 


Accounts payable $17,488.68 
Deferred income (collections applicable to subse- 
quent years): 
$18,987.00 
6,535.80 
466.84 


$25 , 989.64 


$10, 430.69 


—_____ 


$16,827.00 
5,817.77 
466 .84 


$23 , 111.61 





Net worth: 
Life Membership reserve 
Surplus, per statement... . 


$ 3,796.98 
30,360.81 


$ 3,579.92 
26 , 908.82 





Total net worth $34,157.79 


$30 , 488.74 





Total Liabilities and Net Worth $77 ,636.11 


$64 ,030.95 


——— 





Note: The amount listed as cash in the banks and on hand for 1954 breaks down 


as follows: 
Checking Account (American Security & Trust Co.).... 
Hyattsville Building Assn 
Jefferson Federal Savings & Loan Assn 
National Permanent Building Assn 
Liberty Building Assn 
American Building Assn 
Interstate Building Assn 


$17,865.34 
1,013.33 
10,561.33 
5,073.00 
10,195.26 
5,117.16 
9,915.78 


$59,741.20 


AMERICAN STATISTICAL ASSOCIATION 
STATEMENT OF INCOME AND SURPLUS ACCOUNTS 


Year ended December 31, 


1954 
Income: 


Dues—Current year $40 , 232.75 
—Prior year .00 
Subscriptions—Journal .22 
—American Statistician 75 
Advertising—Journal .82 
—American Statistician .00 
Sales—Journal .95 
—American Statistician .95 
—Acceptance Sampling Monograph .00 
—Emblems, less cost of sales .75 
—Biometrics 
41 
Mailing list income 92 
Interest income 70 
Annual meeting 84 
Reimbursement of overhead expenses: 
Bureau of Mines Project 
Miscellaneous income -43 


1953 


$38 , 607 .00 


852.00 
10,134.80 
443.68 
1,415.99 
263.97 
1,937.13 
148.02 
302.05 
83.28 
568.37 
45.00 
911.11 
911.84 
1,559.52 


2,207 .06 
54.12 





Total income 


$59,209.49 $60,444.94 
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Expenses: 


Publications—Schedule I 
Promotion 


Travel and secretarial expense 
Supplies 


Telephone and telegraph 
Accounting services 

Committee expense 

Annual meeting expense 
Miscellaneous expenses—Schedule [ 


Total expenses......... bsbbednbdedesededbesacs 


Excess of income over expenses for the year 
Add: Surplus account at beginning of year 


Surplus account at end of year 


$14,368.77 
27,679.05 
644.85 
2,400.00 
761.96 
1,692.30 
2,106.37 
1,128.84 
970.00 
1,191.19 
996.00 
1,818.17 


$12,260.07 
24,945.73 
861.89 
2,400.00 
800.83 
2,240.92 
1,849.26 
729.02 
970.00 
1,269.75 
645.99 
1,528.00 





$55 , 757.50 


$50 ,501.46 





$ 3,451.99 


26 ,908 .82 


$ 9,943.48 
16,965.34 





$30 ,360 .81 


AMERICAN STATISTICAL ASSOCIATION 
Year ended December 31, 


Publications: 

Journal—Printing 
—Abstracts 
—Editorial expense 
—Cost of old Journals 
—Delivery charges 
—Storage charges 


American Statistician 
Acceptance Sampling Monograph 
Membership Directory, allocated expense less sales 


Total publications 


Miscellaneous Expenses: 
Life Membership expense 
Depreciation 
Dues to other organizations 


Repairs and maintenance 
Insurance 


1964 


$16,994.75 


$26 ,908 .82 


Schedule I 


1958 


$15,118.77 
750 .00 
1,215.30 
188 .67 
36.11 
114.00 





$19,683.11 
6,314.95 
83.75 
1,597.24 


$17 ,422.85 
6,584.28 
129.24 

809 .36 





$27 ,679 .05 


50 .06 
578 .56 
123.25 

2.53 
608 .34 
128 .68 
220.21 
106.54 


$24,945.73 


90.13 
617 .64 
123.25 

4.50 
314.89 
153.25 

24.53 

199.81 





$ 1,818.17 


$ 1,528.00 
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RCA IS NOW CREATING 
TOMORROW'S 
MOST ADVANCED, COMPREHENSIVE 


ELECTRONIC 
DATA PROCESSING 
SYSTEMS! 











Fast-moving computer advances at RCA cal! for many 
more computer engineers. If you havea BS oradvanced degree 
and at least 2 years’ design and development experience 
... this is your opportunity to team up with RCA scientists 
whose far-reaching new systems concepts utilize the latest 
digital techniques to broaden the scope of the electronic 
data processing field. 


SPECIALIZE IN YOUR AREA OF INTEREST: 


Transistor Circuits Magnetic Circuits 

Magnefic Core Memories Magnetic Tape Storage 

Rapid Access File High Speed Printing 

Switching Systems High Speed Data Translation 
Printed Wiring Design Automatic Programming 
Diagnostic Program Routines System Analysis and Synthesis 


Modern employe benefits . . . Relocation assistance available. 


: Mr. John R. Weld 
BEGIN YOUR PROGRESS NOW WITH RCA— : Employment Meneger 


Send complete resume of education : Dept. B-12F 


: : Radio ¢ at 
and experience to: : Camden 2, N. . of Amectes 


RADIO CORPORATION of AMERICA 
ENGINEERING PRODUCTS DIVISION, CAMDEN, N.J. 


a SS 


Please mention the Journal of the Amenican Sratisticat Association in writing advertisers 














